PHP String Handling

Problematic code to find a substring inside a larger string

if (strpos($aString,'Waldo') != false) {
   echo 'I found Waldo!';
}
The code above can result in problems for the reasons discussed above. It’s always better to use !== instead of !=.
Strings And Patterns
As we mentioned in the PHP Basics chapter, strings wear many hats in PHP—far
from being relegated to mere collections of textual characters, they can be used to
store binary data of any kind—as well as text encoded in a way that PHP does not
understand natively, but that one of its extensions can manipulate directly.
Stringmanipulation is a very important skill for every PHPdeveloper—a fact that is
reflected in the number of exam questions that either revolve directly around strings
or that require a firmgrasp on the way they work. Therefore, you should ensure that
you are very familiar with them before taking the exam.
Keep in mind, however, that strings are a vast topic; once again, we focus on the
PHP features that are most likely to be relevant to the Zend exam.

String Basics
Strings can be defined using one of several methods. Most commonly, you will encapsulate
them in single quotes or double quotes. Unlike some other languages,
these two methods behave quite differently: single quotes represent “simple strings,”
where almost all characters are used literally. Double quotes, on the other hand, encapsulate
“complex strings” that allow for special escape sequences (for example, to
insert special characters) and for variable substitution, which makes it possible to
embed the value of a variable directly in a string, without the need for any special
operator.

Escape sequences are sometimes called control characters and take the form of a
backslash (\) followed by one or more characters. Perhaps the most common escape
sequence is the newline character \n. In the following example, we use hex and octal
notation to display an asterisk:
echo "\x2a";
echo "\052";

Variable Interpolation
Variables can be embedded directly inside a double-quote string by simply typing
their name. For example:
$who = "World";
echo "Hello $who\n"; // Shows "Hello World" followed by a newline
echo ’Hello $who\n’; // Shows "Hello $who\n"
Clearly, this “simple” syntax won’t work in those situations in which the name of the
variable you want to interpolated is positioned in such a way inside the string that
the parser wouldn’t be able to parse its name in the way you intend it to. In these
cases, you can encapsulate the variable’s name in braces:
$me = ’Davey’;
$names = array (’Smith’, ’Jones’, ’Jackson’);
echo "There cannot be more than two {$me}s!";
echo "Citation: {$names[1]}[1987]";
In the first example above, the braces help us append a hard-coded letter “s” to the
value of $me—without them, the parserwould be looking for the variable $mes, which,
obviously, does not exist. In the second example, if the braces were not available, the
parser would interpret our input as $names[1][1987], which is clearly not what we
intended.

The Heredoc Syntax
A third syntax, called heredoc, can be used to declare complex strings—in general,
the functionality it provides is similar to double quotes, with the exception that, because
heredoc uses a special set of tokens to encapsulate the string, it’s easier to
declare strings that include many double-quote characters.
A heredoc string is delimited by the special operator <<< followed by an identifier.
Youmust then close the string using the same identifier, optionally followed by a
semicolon, placed at the very beginning of its own line (that is, it should not be preceded
by whitespace). Heredoc identifiers must follow the same rules are variable
naming (explained in the PHP Basics chapter), and are similarly case-sensitive.
The heredoc syntax behaves like double quotes in every way, meaning that variables
and escape sequences are interpolated:
$who = "World";
echo <<<TEXT
So I said, "Hello $who"
TEXT;
The above code will output So I said, “Hello World”. Note how the newline characters
right after the opening token and right before the closing token are ignored.
Heredoc strings can be used in almost all situations in which a string is an appropriate
value. The only exception is the declaration of a class property (explained in
the Object Oriented ProgrammingWith PHP chapter), where their use will result in a
parser error:
class Hello {
public $greeting = <<<EOT
Hello World
EOT;
}

Escaping Literal Values
All three string-definition syntax feature a set of several characters that require escaping
in order to be interpreted as literals.
76 ” Strings And Patterns
When using single-quote strings, single quote characters can be escaped using a
backslash:
echo ’This is \’my\’ string’;
A similar set of escaping rules apply to double-quote strings, where double quote
characters and dollar sign can also be escaped by prefixing them with a backslash:
$a = 10;
echo "The value of \$a is \"$a\".";
Backslashes themselves can be escaped in both cases using the same technique:
echo "Here’s an escaped backslash: - \ -";
Note that you cannot escape a brace—therefore, if you need the literal string {$ to be
printed out, you need to escape the dollar sign in order to prevent the parser from
interpreting the sequence as an attempt to interpolate a variable:
echo "Here’s a literal brace + dollar sign: {\$";
Heredoc strings provide the same escaping mechanisms as double-quote strings,
with the exception that you do not need to escape double quote characters, since
they have no semantic value.

Determining the Length of a String
The strlen() function is used to determine the length, in bytes, of a string. Note that
strlen(), like most string functions, is binary-safe. This means that all characters
in the string are counted, regardless of their value. (In some languages (notably C),
some functions are designed to work with “zero-terminated” strings, where the NUL
character is used to signal the end of a string. This causes problems when dealing
with binary objects, since bytes with a value of zero are quite common; luckily, most
PHP functions are capable of handling binary data without any problem.)

Transforming a String
The strtr() function can be used to translate certain characters of a string into other
characters—it is often used as an aid in the practice knownas transliteration to transformcertain
accented characters that cannot appear, for example, in URLs or e-mail
address into the equivalent unaccented versions:
// Single character version
echo strstr (’abc’, ’a’, ’1’); // Outputs 1bc
// Multiple-character version
$subst = array (
’1’ => ’one’,
’2’ => ’two’,
);
echo strtr (’123’, $subst); // Outputs onetwo3

Using Strings as Arrays
You can access the individual characters of a string as if they were members of an
array. For example:
$string = ’abcdef’;
echo $string[1]; // Outputs ’b’
This approach can be very handy when you need to scan a string one character at a
time:
$s = ’abcdef’;
for ($i = 0; $i < strlen ($s); $i++) {
if ($s[$i] > ’c’) {
echo $s[$i];
}
}

Note that string character indices are zero-based—meaning that the first character of
an arbitrary string $s has an index of zero, and the last has an index of strlen($s)-1.

Comparing, Searching and Replacing Strings
Comparison is, perhaps, one of the most common operations performed on
strings. At times, PHP’s type-juggling mechanisms also make it the most maddening—
particularly because strings that can be interpreted as numbers are often transparently
converted to their numeric equivalent. Consider, for example, the following
code:
$string = ’123aa’;
if ($string == 123) {
// The string equals 123
}
You’d expect this comparison to return false, since the two operands are most definitely
not the same. However, PHP first transparently converts the contents of
$string to the integer 123, thus making the comparison true. Naturally, the best way
to avoid this problemis to use the identity operator ===whenever you are performing
a comparison that could potentially lead to type-juggling problems.
In addition to comparison operators, you can also use the specialized functions
strcmp() and strcasecmp() to match strings. These are identical, with the exception
that the former is case-sensitive, while the latter is not. In both cases, a result of zero
indicates that the two strings passed to the function are equal:
$str = "Hello World";
if (strcmp($str, "hello world") === 0) {
// We won’t get here, because of case sensitivity
}
if (strcasecmp($str, "hello world") === 0) {
// We will get here, because strcasecmp()
// is case-insensitive
}

A further variant of strcasecmp(), strcasencmp() allows you to only test a given number
of characters inside two strings. For example:
$s1 = ’abcd1234’;
$s2 = ’abcd5678’;
// Compare the first four characters
echo strcasencmp ($s1, $s2, 4);
i You can also perform a comparison between portions of strings by using the
substr_compare() function.

Simple Searching Functionality
PHP provides a number of very powerful search facilities whose functionality varies
from the very simple (and correspondingly faster) to the very complex (and correspondingly
slower).
The simplest way to search inside a string is to use the strpos() and strstr() families
of functions. The former allows you to find the position of a substring (usually
called the needle) inside a string (called the haystack). It returns either the numeric
position of the needle’s first occurrencewithin the haystack, or false if a match could
not be found. Here’s an example:
$haystack = "abcdefg";
$needle = ’abc’;
if (strpos ($haystack, $needle) !== false) {
echo ’Found’;
}
Note that, because strings are zero-indexed, it is necessary to use the identity operators
when calling strpos() to ensure that a return value of zero—which indicates
that the needle occurs right at the beginning of the haystack—is not mistaken for a
return value of false.

You can also specify an optional third parameter to strpos() to indicate that you
want the search to start froma specific position within the haystack. For example:
$haystack = ’123456123456’;
$needle = ’123’;
echo strpos ($haystack, $needle); // outputs 0
echo strpos ($haystack, $needle, 1); // outputs 6
The strstr() function works similarly to strpos() in that it searches the haystack
for a needle. The only real difference is that this function returns the portion of the
haystack that starts with the needle instead of the latter’s position:
$haystack = ’123456’;
$needle = ’34’;
echo strstr ($haystack, $needle); // outputs 3456
i In general, strstr() is slower than strpos()—therefore, you should use the latter if
your only goal is to determine whether a certain needle occurs inside the haystack.
Also, note that you cannot force strstr() to start looking for the needle from a given
location by passing a third parameter.
Both strpos() and strstr() are case sensitive and start looking for the needle from
the beginning of the haystack. However, PHP provides variants that work in a caseinsensitive
way or start looking for the needle from the end of the haystack. For
example:
// Case-insensitive search
echo stripos(’Hello World’, ’hello’); // outputs zero
echo stristr(’Hello My World’, ’my’); // outputs "My World"
// Reverse search
echo strrpos (’123123’, ’123’); // outputs 3

Matching Against aMask
You can use the strspan() function to match a string against a “whitelist” mask of
allowed characters. This function returns the length of the initial segment of the
string that contains any of the characters specified in the mask:
$string = ’133445abcdef’;
$mask = ’12345’;
echo strspn ($string, $mask); // Outputs 6
i The strcspn() function works just like strspn(), but uses a blacklist approach instead—
that is, the mask is used to specify which characters are disallowed, and the
function returns the length of the initial segment of the string that does not contain
any of the characters from themask.
Both strspn() and strcspn() accept two optional parameters that define the starting
position and the length of the string to examine. For example:
$string = ’1abc234’;
$mask = ’abc’;
echo strspn ($string, $mask, 1, 4);
In the example above, strspn() will start examining the string from the second character
(index 1), and continue for up to four characters—however, only the first three
character it encounters satisfy the mask’s constraints and, therefore, the script outputs
3.
Simple Search and Replace Operations
Replacing portions of a string with a different substring is another very common task
for PHP developers. Simple substitutions are performed using str_replace() (aswell
as its case-insensitive variation, str_ireplace()) and substr_replace(). Here’s an
example:
82 ” Strings And Patterns
echo str_replace("World", "Reader", "Hello World");
echo str_ireplace("world", "Reader", "Hello World");
In both cases, the function takes three parameters: a needle, a replacement string
and a haystack. PHP will attempt to look for the needle in the haystack (using either
a case-sensitive or case-insensitive search algorithm) and substitute every single instance
of the latter with the replacement string. Optionally, you can specify a third
parameter, passed by reference, that the function fills, upon return, with the number
of substitutions made:
$a = 0; // Initialize
str_replace (’a’, ’b’, ’a1a1a1’, $a);
echo $a; // outputs 3
If you need to search and replace more than one needle at a time, you can pass the
first two arguments to str_replace() in the formof arrays:
echo str_replace(array("Hello", "World"), array("Bonjour", "Monde"), "Hello
World");
echo str_replace(array("Hello", "World"), "Bye", "Hello World");
In the first example, the replacements are made based on array indexes—the first
element of the search array is replaced by the first element of the replacement array,
and the output is “Bonjour Monde”. In the second example, only the needle
argument is an array; in this case, both search terms are replaced by the same string
resulting in “Bye Bye”.
If you need to replace a portion of a needle of which you already know the starting
and ending point, you can use substr_replace():
echo substr_replace("Hello World", "Reader", 6);
echo substr_replace("Canned tomatoes are good", "potatoes", 7, 8);
Strings And Patterns ” 83
The third argument is our starting point—the space in the first example; the function
replaces the contents of the string from here until the end of the string with the
second argument passed to it, thus resulting in the output Hello Reader. You can
also pass an optional fourth parameter to define the end of the substring that will
be replaced (as shown in the second example, which outputs Canned potatoes are
good).
Combining substr_replace() with strpos() can prove to be a powerful tool. For
example:
$user = "davey@php.net";
$name = substr_replace($user, "", strpos($user, ’@’);
echo "Hello " . $name;
By using strpos() to locate the first occurrence of the @ symbol, we can replace the
rest of the e-mail address with an empty string, leaving us with just the username,
which we output in greeting.
Extracting Substrings
The very flexible and powerful substr() function allows you to extract a substring
from a larger string. It takes three parameters: the string to be worked on, a starting
index and an optional length. The starting index can be specified as either a positive
integer (meaning the index of a character in the string starting from the beginning)
or a negative integer (meaning the index of a character starting from the end). Here
are a few simple examples:
$x = ’1234567’;
echo substr ($x, 0, 3); // outputs 123
echo substr ($x, 1, 1); // outputs 2
echo substr ($x, -2); // outputs 67
echo substr ($x, 1); // outputs 234567
echo substr ($x, -2, 1); // outputs 6
84 ” Strings And Patterns

Formatting Strings
PHP provides a number of different functions that can be used to format output in a
variety of ways. Some of them are designed to handle special data types—for example,
numbers of currency values—while others provide a more generic interface for
formatting strings according to more complex rules.
Formatting rules are sometimes governed by locale considerations. For example,
most English-speaking countries format numbers by using commas as the separators
between thousands, and the point as a separator between the integer portion
of a number and its fractional part. In many European countries, this custom is reversed:
the dot (or a space) separates thousands, and the comma is the fractional
delimiter.
In PHP, the current locale is set by calling the setlocale() function, which takes
two parameters: the name of the locale you want to set and a category that indicates
which functions are affected by the change. For example, you can affect currency
formatting (whichwe’ll examine in a few paragraphs) to reflect the standard US rules
by calling setlocale() as in the following example:
setlocale (LC_MONETARY, ’en_US’);

Formatting Numbers
Number formatting is typically usedwhen you wish to output a number and separate
its digits into thousands and decimal points. The number_format() function, used for
this purpose, is not locale-aware. This means that, even if you have a French or
German locale set , it will still use periods for decimals and commas for thousands,
unless you specify otherwise.
The number_format() function accepts 1, 2 or 4 arguments (but not three). If only
one argument is given, the default formatting is used: the number will be rounded
to the nearest integer, and a comma will be used to separate thousands. If two arguments
are given, the number will be rounded to the given number of decimal places
and a period and comma will be used to separate decimals and thousands, respectively.
Should you pass in all four parameters, the number will be rounded to the
Strings And Patterns ” 85
number of decimal places given, and number_format() will use the first character of
the third and fourth arguments as decimal and thousand separators respectively.
Here are a few examples:
echo number_format("100000.698"); // Shows 100,001
echo number_format("100000.698", 3, ",", " ,"); // Shows 100,000,698

Formatting Currency Values
Currency formatting, unlike number formatting, is locale aware and will display the
correct currency symbol (either international or national notations—e.g.: USD or $,
respectively) depending on how your locale is set.
When using money_format(), we must specify the formatting rules we want to use
by passing the function a specially-crafted string that consists of a percent symbol
(%) followed by a set of flags that determine the minimumwidth of the resulting output,
its integer and decimal precision and a conversion character that determines
whether the currency value is formatted using the locale’s national or international
rules.
i The money_format() function is not available onWindows, as well as on some variants
of UNIX.
For example, to output a currency value using the American national notation with
two decimal places, we’d use the following function call:
setlocale(LC_MONETARY, "en_US");
echo money_format(’%.2n’, "100000.698");
This example displays “$100,000.70”.
If we simply change the locale to Japanese, we can display the number in Yen.
setlocale(LC_MONETARY, "ja_JP.UTF-8");
echo money_format(’%.2n’, "100000.698");

This time, the output is “¥100,000.70”. Similarly, if we change our formatting to use
the i conversion character, money_format() will produce its output using the international
notation, for example:
setlocale(LC_MONETARY, "en_US");
echo money_format(’%.2i’, "100000.698");
setlocale(LC_MONETARY, "ja_JP");
echo money_format(’%.2i’, "100000.698");
The first example displays “USD 100,000.70”, while the second outputs “JPY
100,000.70”. As you can see, money_format() is a must for any international commerce
site that acceptsmultiple currencies, as it allows you to easily display amounts
in currencies that you are not familiar with.
There are two important things that you should keep in mind here. First, a call
to setlocale() affects the entire process inside which it is executed, rather than the
individual script. Thus, you should be careful to always reset the locale whenever
you need to performa formatting operation, particularly if your application requires
the use of multiple locales, or is hosted alongside other applications that may.
In addition, you should keep in mind that the default rounding rules change from
locale to locale. For example, US currency values are regularly expressed as dollars
and cents, while Japanese currency values are represented as integers. Therefore, if
you don’t specify a decimal precision, the same value can yield very different localedependent
formatted strings:
setlocale(LC_MONETARY, "en_US");
echo money_format(’%i’, "100000.698");
setlocale(LC_MONETARY, "ja_JP");
echo money_format(’%i’, "100000.698");
The first example displays “USD 100,000.70”; however, the Japanese output is now
“JPY 100,001”—as you can see, this last value was rounded up to the next integer.

Generic Formatting
If you are not handling numbers or currency values, you can use the printf() family
of functions to perform arbitrary formatting of a value. All the functions in this
group performin an essentially identical way: they take an input string that specifies
the output format and one or more values. The only difference is in the way they return
their results: the “plain” printf() function simply writes it to the script’s output,
while other variants may return it (sprintf()), write it out to a file (fprintf()), and
so on.
The formatting string usually contains a combination of literal text—that is copied
directly into the function’s output—and specifiers that determine how the input
should be formatted. The specifiers are then used to format each input parameter
in the order in which they are passed to the function (thus, the first specifier is used
to format the first data parameter, the second specified is used to format the second
parameter, and so on).
A formatting specifier always starts with a percent symbol (if you want to insert a
literal percent character in your output, you need to escape it as %%) and is followed
by a type specification token, which identifies the type of formatting to be applied; a
number of optional modifiers can be inserted between the two to affect the output:
• A sign specifier (a plus of minus symbol) to determine how signed numbers are
to be rendered
• A padding specifier that indicates what character should be used to make up
the required output length, should the input not be long enough on its own
• An alignment specifier that indicates if the output should be left or right
aligned
• A numeric width specifier that indicates theminimumlength of the output
• A precision specifier that indicates how many decimal digits should be displayed
for floating-point numbers
It is important that you be familiarwith some of the most commonly-used type specifiers:
88 ” Strings And Patterns
b Output an integer as a Binary number.
c Output the character which has the input integer as its ASCII value.
d Output a signed decimal number
e Output a number using scientific notation (e.g., 3.8e+9)
u Output an unsigned decimal number
f Output a locale aware float number
F Output a non-locale aware float number
o Output a number using its Octal representation
s Output a string
x Output a number as hexadecimal with lowercase letters
X Output a number as hexadecimal with uppercase letters
Here are some simple examples of printf() usage:
$n = 123;
$f = 123.45;
$s = "A string";
printf ("%d", $n); // prints 123
printf ("%d", $f); // prints 1
// Prints "The string is A string"
printf ("The string is %s", $s);
// Example with precision
printf ("%3.3f", $f); // prints 123.450
// Complex formatting
function showError($msg, $line, $file)
{
return sprintf("An error occured in %s on ".
"line %d: %s", $file, $line, $msg);
}
showError ("Invalid deconfibulator", __LINE__, __FILE__);
Strings And Patterns ” 89
Parsing Formatted Input
The sscanf() family of functions works in a similar way to printf(), except that, instead
of formatting output, it allows you to parse formatted input. For example, consider
the following:
$data = ’123 456 789’;
$format = ’%d %d %d’;
var_dump (sscanf ($data, $format));
When this code is executed, the function interprets its input according to the rules
specified in the format string and returns an array that contains the parsed data:
array(3) {
[0]=>
int(123)
[1]=>
int(456)
[2]=>
int(789)
}
Note that the data must match the format passed to sscanf() exactly—or the functionwill
fail to retrieve all the values. For this reason, sscanf() is normally only useful
in those situations in which input follows a well-defined format (that is, it is not provided
by the user!).
Perl-compatible Regular Expressions
Perl Compatible Regular Expressions (normally abbreviated as “PCRE”) offer a very
powerful string-matching and replacement mechanism that far surpasses anything
we have examined so far.
Regular expressions are often thought of as very complex—and they can be at
times. However, properly used they are relatively simple to understand and fairly
easy to use. Given their complexity, of course, they are also much more computationally
intensive than the simple search-and-replace functions we examined ear90
” Strings And Patterns
lier in this chapter. Therefore, you should use them only when appropriate—that is,
when using the simpler functions is either impossible or so complicated that it’s not
worth the effort.
A regular expression is a string that describes a set of matching rules. The simplest
possible regular expression is one that matches only one string; for example, Davey
matches only the string “Davey”. In fact, such a simple regular expression would be
pointless, as you could just as easily perform the match using strpos(), which is a
much faster alternative.
The real power of regular expressions comes into play when you don’t know the
exact string that you want to match. In this case, you can specify one or more metacharacters
and quantifiers, which do not have a literal meaning, but instead stand to
be interpreted in a special way.
In this chapter, we will discuss the basics of regular expressions that are required
by the exam. More thorough coverage is provided by the PHP manual, or by one of
the many regular expression books available (most notably, Mastering Regular Expressions,
by Jeffrey Friedl, published by O’ReillyMedia).
Delimiters
A regular expression is always delimited by a starting and ending character. Any character
can be used for this purpose (as long as the beginning and ending delimiter
match); since any occurrence of this character inside the expression itself must be
escaped, it’s usually a good idea to pick a delimiter that isn’t likely to appear inside
the expression. By convention, the forward slash is used for this purpose—although,
for example, another character like the octothorpe is sometimes used when dealing
with pathnames or URLs.
Metacharacters
The term “metacharacter” is a bit of a misnomer—as a metacharacter can actually
be composed of more than one character. However, every metacharacter represents
a single character in the matched expression. Here are the most common ones:
Strings And Patterns ” 91
. Match any character
ˆ Match the start of the string
$ Match the end of the string
\s Match any whitespace character
\d Match any digit
\w Match any “word” character
Metacharacters can also be expressed using grouping expressions. For example, a
series of valid alternatives for a character can be provided by using square brackets:
/ab[cd]e/
The expression above will match both abce and abde. You can also use other
metacharacters, and provide ranges of valid characters inside a grouping expression:
/ab[c-e\d]/
This will match abc, abd, abe and any combination of ab followed by a digit.
Quantifiers
A quantifier allows you to specify the number of times a particular character or
metacharacter can appear in a matched string. There are four types of quantifiers:
* The character can appear zero ormore times
+ The character can appear one or more times
? The character can appear zero or one times
{n,m} The character can appear at least n times, and no more than m.
Either parameter can be omitted to indicated a minimum limit
with nomaximum, or a maximum limit without aminimum, but
not both.
Thus, for example, the expression ab?c matches both ac and abc, while ab{1,3}c
matches abc, abbc and abbbc.

Sub-Expressions
A sub-expression is a regular expression contained within the main regular expression
(or another sub-expression); you define one by encapsulating it in parentheses:
/a(bc.)e/
This expression will match the letter a, followed by the letters b and c, followed by
any character and, finally the letter e. As you can see, sub-expressions by themselves
do not have any influence on the way a regular expression is executed; however, you
can use them in conjunction with quantifiers to allow for complex expressions to
happen more than once. For example:
/a(bc.)+e/
This expression will match the letter a, followed by the expression bc. repeated one
or more times, followed by the letter e.
Sub-expressions can also be used as capturing patterns, which we will examine in
the next section.

Matching and Extracting Strings
The preg_match() function can be used to match a regular expression against a given
string. The function returns true if the match is successful, and can return all the
captured subpatterns in an array if an optional third parameter is passed by reference.
Here’s an example:
$name = "Davey Shafik";
// Simple match
$regex = "/[a-zA-Z\s]/";
if (preg_match($regex, $name)) {
// Valid Name
}
// Match with subpatterns and capture
Strings And Patterns ” 93
$regex = ’/^(\w+)\s(\w+)/’;
$matches = array();
if (preg_match ($regex, $name, $matches)) {
var_dump ($matches);
}
If you run the second example, you will notice that the $matches array is populated,
on return with the following values:
array(3) {
[0]=>
string(12) "Davey Shafik"
[1]=>
string(5) "Davey"
[2]=>
string(6) "Shafik"
}
As you can see, the first element of the array contains the entire matched string,
while the second element (index 1) contains the first captured subpattern, and the
third element contains the second matched subpattern.

PerformingMultipleMatches
The preg_match_all() function allows you to perform multiple matches on a given
string based on a single regular expression. For example:
$string = "a1bb b2cc c2dd";
$regex = "#([abc])\d#";
$matches = array();
if (preg_match_all ($regex, $string, $matches)) {
var_dump ($matches);
}
This script outputs the following:
array(2) {
94 ” Strings And Patterns
[0]=>
array(3) {
[0]=>
string(2) "a1"
[1]=>
string(2) "b2"
[2]=>
string(2) "c2"
}
[1]=>
array(3) {
[0]=>
string(1) "a"
[1]=>
string(1) "b"
[2]=>
string(1) "c"
}
}
As you can see, all the whole-pattern matches are stored in the first sub-array of the
result, while the first captured subpattern of every match is stored in the corresponding
slot of the second sub-array.
Using PCRE to Replace Strings
Whilst str_replace() is quite flexible, it still only works on “whole” strings, that is,
where you know the exact text to search for. Using preg_replace(), however, you can
replace text that matches a pattern we specify. It is even possible to reuse captured
subpatterns directly in the substitution string by prefixing their index with a dollar
sign. In the example below, we use this technique to replace the entire matched
pattern with a string that is composed using the first captured subpattern ($1).
$body = "[b]Make Me Bold![/b]";
$regex = "@\[b\](.*?)\[/b\]@i";
$replacement = ’<b>$1</b>’;
$body = preg_replace($regex, $replacement, $body);
Strings And Patterns ” 95
Just like with str_replace(), we can pass arrays of search and replacement arguments;
however, unlike str_replace(), we can also pass in an array of subjects
on which to perform the search-and-replace operation. This can speed things up
considerably, since the regular expression (or expressions) are compiled once and
reused multiple times. Here’s an example:
$subjects[’body’] = "[b]Make Me Bold![/b]";
$subjects[’subject’] = "[i]Make Me Italics![/i]";
$regex[] = "@\[b\](.*?)\[/b\]@i";
$regex[] = "@\[i\](.*?)\[/i\]@i";
$replacements[] = "<b>$1</b>";
$replacements[] = "<i>$1</i>";
$results = preg_replace($regex, $replacements, $subjects);
When you execute the code shown above, you will end up with an array that looks
like this:
array(2) {
["body"]=>
string(20) "<b>Make Me Bold!</b>"
["subject"]=>
string(23) "<i>Make Me Italic!</i>"
}
Notice how the resulting array maintains the array structure of our $subjects array
that we passed in, which, however, is not passed by reference, nor is itmodified.


Reactions: