Strings are sequences of characters (like hello). Each character is an 8-bit value from the entire 256 character set (there's nothing special about the NUL character as in some languages).
The shortest possible string has no characters. The longest string fills all of your available memory (although you wouldn't be able to do much with that). This is in accordance with the principle of "no built-in limits" that Perl follows at every opportunity. Typical strings are printable sequences of letters and digits and punctuation in the ASCII 32 to ASCII 126 range. However, the ability to have any character from 0 to 255 in a string means you can create, scan, and manipulate raw binary data as strings - something with which most other utilities would have great difficulty. For example, you could update a graphical image or compiled program by reading it into a Perl string, making the change, and writing the result back out.
Like numbers, strings have a literal representation, which is the way you represent the string in a Perl program. Literal strings come in two different flavors: single-quoted string literals and double-quoted string literals. [1]
A single-quoted string literal is a sequence of characters enclosed in single quotes. The single quotes are not part of the string itself; they're just there to let Perl identify the beginning and the ending of the string. Any character other than a single quote or a backslash between the quote marks (including newline characters, if the string continues onto successive lines) stands for itself inside a single-quoted string literal. To get a backslash, put two backslashes in a row, and to get a single quote, put a backslash followed by a single quote. In other words:
'hello' # five characters: h, e, l, l, o 'don\'t' # five characters: d, o, n, single-quote, t '' # the null string (no characters) 'silly\\me' # silly, followed by backslash, followed by me 'hello\n' # hello followed by backslash followed by n 'hello there' # hello, newline, there (11 characters total) 'Don\'t let an apostrophe end this string prematurely!' 'the last character of this string is a backslash: \\' '\'\\' # single quote followed by backslash
Note that the \n within a single-quoted string is not interpreted as a newline, but as the two characters backslash and n. (Only when the backslash is followed by another backslash or a single quote does it have special meaning.)
A double-quoted string literal is similar to the strings you may have seen in other languages. Once again, it's a sequence of characters, although this time enclosed in double quotes. But now the backslash takes on its full power to specify certain control characters, or even any character at all through octal and hex representations. Here are some double-quoted strings:
"barney" # just the same as 'barney' "hello world\n" # hello world, and a newline "new \177" # new, space, and the delete character (octal 177) "The last character of this string is a quote mark: \"" "coke\tsprite" # coke, a tab, and sprite
Note that the double-quoted literal string "barney" means the same six-character string to Perl as does the single-quoted literal string 'barney'. It's like what we saw with numeric literals, where we saw that 0377 was another way to write 255.0. Perl lets you write the literal in the way that makes more sense to you. Of course, if you wish to use a backslash escape (like \n to mean a newline character), you'll need to use the double-quotes.
The backslash can precede many different characters to mean different things (generally called a backslash escape). The complete list of double-quoted string escapes is given in Table 2.1.
| Construct | Meaning |
|---|---|
\n
|
Newline |
\r
|
Return |
\t
|
Tab |
\f
|
Formfeed |
\b
|
Backspace |
\a
|
Bell |
\e
|
Escape |
\007
|
Any octal ASCII value (here, 007 = bell) |
\x7f
|
Any hex ASCII value (here, 7f = delete) |
\cC
|
Any "control" character (here, CTRL-C) |
\\
|
Backslash |
\"
|
Double quote |
\l
|
Lowercase next letter |
\L
|
Lowercase all following letters until
\E
|
\u
|
Uppercase next letter |
\U
|
Uppercase all following letters until
\E
|
\Q
|
Quote non-word characters by adding a
backslash until
\E
|
\E
|
Terminate
\L
,
\U,
or
\Q
|
Another feature of double-quoted strings is that they are variable interpolated, meaning that scalar (and array) variables within the strings are replaced with their current values when the strings are used. We haven't formally been introduced to what a variable looks like yet, so we'll get back to this later.
A quick note here about using DOS/Win32 pathnames in double-quoted strings: while Perl accepts either backslashes or forward slashes in path names, backslashes need to be escaped. So, you need to write one of the following:
"c:\\temp" # use an escaped backslash
"c:/temp" # use a Unix-style forward slash
If you forget to escape the backslash, you'll end up with strange results:
"c:\temp" # WRONG - this string contains a c:, a TAB, and emp
If you're already used to using pathnames in C/C++, this notation will be second nature to you. Otherwise, beware: pathnames seem to bite each and every Perl-for-Win32 programmer from time to time.
String values can be concatenated with the "." operator. (Yes, that's a single period.) This does not alter either string, any more than 2+3 alters either 2 or 3. The resulting (longer) string is then available for further computation or to be stored into a variable.
"hello" . "world" # same as "helloworld" 'hello world' . "\n" # same as "hello world\n" "fred" . " " . "barney" # same as "fred barney"
Note that the concatenation must be explicitly called for with the "." operator. You can't just stick the two values close to each other.
Another set of operators for strings are the string comparison operators. These operators are FORTRAN-like, as in lt for less-than, and so on. The operators compare the ASCII values of the characters of the strings in the usual fashion. The complete set of comparison operators (for both numbers and strings) is given in Table 2.2.
|
Comparison |
Numeric |
String |
|---|---|---|
|
Equal |
|
|
|
Not equal |
|
|
|
Less than |
|
|
|
Greater than |
|
|
|
Less than or equal to |
|
|
|
Greater than or equal to |
|
|
You may wonder why there are separate operators for numbers and strings, if numbers and strings are automatically converted back and forth. Consider the two values 7 and 30. If compared as numbers, 7 is obviously less than 30, but if compared as strings, the string "30" comes before the string "7" (because the ASCII value for 3 is less than the value for 7), and hence is less. Perl always requires you to specify the proper type of comparison, whether it be numeric or string.
Note that if you come from a UNIX shell programming background, the numeric and string comparisons are roughly opposite of what they are for the UNIX test command, which uses -eq for numeric comparison and = for string comparison.
Still another string operator is the string repetition operator, consisting of the single lowercase letter x.  This operator takes its left operand (a string), and makes as many concatenated copies of that string as indicated by its right operand (a number). For example:
"fred" x 3 # is "fredfredfred" "barney" x (4+1) # is "barney" x 5, or # "barneybarneybarneybarneybarney" (3+2) x 4 # is 5 x 4, or really "5" x 4, which is "5555"
That last example is worth spelling out slowly. The parentheses on (3+2) force this part of the expression to be evaluated first, yielding five. (The parentheses here are working as in standard math.) But the string repetition operator wants a string for a left operand, so the number 5 is converted to the string "5" (using rules described in detail later), a one-character string. This new string is then copied four times, yielding the four-character string 5555. If we had reversed the order of the operands, we would have made five copies of the string 4, yielding 44444. This shows that string repetition is not commutative.
If necessary, the copy count (the right operand) is first truncated to an integer value (4.8 becomes 4) before being used. A copy count of less than one results in an empty (zero-length) string.
Next: Automatic Conversion Between Numbers and Strings
[1]
There are also the here strings, similar to the UNIX shell's here
documents, which are covered in the "Intermediate" Perl lessons.
Return to the page from whence you came