downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

Our source is open

The syntax highlighted source is automatically generated by PHP from the plaintext script. If you're interested in what's behind the several functions we used, you can always take a look at the source of the following files:

Of course, if you want to see the source of this page, we have it available. You can also browse the SVN repository for this website on svn.php.net.

Source of: /manual/en/regexp.reference.repetition.php

<?php
include_once $_SERVER['DOCUMENT_ROOT'] . '/include/shared-manual.inc';
$TOC = array();
$PARENTS = array();
include_once
dirname(__FILE__) ."/toc/reference.pcre.pattern.syntax.inc";
$setup = array (
 
'home' =>
  array (
   
0 => 'index.php',
   
1 => 'PHP Manual',
  ),
 
'head' =>
  array (
   
0 => 'UTF-8',
   
1 => 'en',
  ),
 
'this' =>
  array (
   
0 => 'regexp.reference.repetition.php',
   
1 => 'Repetition',
  ),
 
'up' =>
  array (
   
0 => 'reference.pcre.pattern.syntax.php',
   
1 => 'Pattern Syntax',
  ),
 
'prev' =>
  array (
   
0 => 'regexp.reference.subpatterns.php',
   
1 => 'Subpatterns',
  ),
 
'next' =>
  array (
   
0 => 'regexp.reference.back-references.php',
   
1 => 'Back references',
  ),
);
$setup["toc"] = $TOC;
$setup["parents"] = $PARENTS;
manual_setup($setup);

manual_header();
?>
<div id="regexp.reference.repetition" class="section">
     <h2 class="title">Repetition</h2>
     <p class="para">
     Repetition is specified by quantifiers, which can follow any
     of the following items:

      </p><ul class="itemizedlist">
       <li class="listitem"><span class="simpara">a single character, possibly escaped</span></li>
       <li class="listitem"><span class="simpara">the . metacharacter</span></li>
       <li class="listitem"><span class="simpara">a character class</span></li>
       <li class="listitem"><span class="simpara">a back reference (see next section)</span></li>
       <li class="listitem"><span class="simpara">a parenthesized subpattern (unless it is  an  assertion  -
     see below)</span></li>
      </ul><p>
    </p>
    <p class="para">
     The general repetition quantifier specifies  a  minimum  and
     maximum  number  of  permitted  matches,  by  giving the two
     numbers in curly brackets (braces), separated  by  a  comma.
     The  numbers  must be less than 65536, and the first must be
     less than or equal to the second. For example:

       <i>z{2,4}</i>

     matches &quot;zz&quot;, &quot;zzz&quot;, or &quot;zzzz&quot;. A closing brace on  its  own
     is not a special character. If the second number is omitted,
     but the comma is present, there is no upper  limit;  if  the
     second number and the comma are both omitted, the quantifier
     specifies an exact number of required matches. Thus

       <i>[aeiou]{3,}</i>

     matches at least 3 successive vowels,  but  may  match  many
     more, while

       <i>\d{8}</i>

     matches exactly 8 digits.  An  opening  curly  bracket  that
     appears  in a position where a quantifier is not allowed, or
     one that does not match the syntax of a quantifier, is taken
     as  a literal character. For example, {,6} is not a quantifier,
     but a literal string of four characters.
    </p>
    <p class="para">
     The quantifier {0} is permitted, causing the  expression  to
     behave  as  if the previous item and the quantifier were not
     present.
    </p>
    <p class="para">
     For convenience (and  historical  compatibility)  the  three
     most common quantifiers have single-character abbreviations:

     </p><table class="doctable table">
      <caption><b>Single-character quantifiers</b></caption>
     
       <tbody valign="middle" class="tbody">
        <tr valign="middle">
         <td align="left"><i>*</i></td>
         <td align="left">equivalent to <i>{0,}</i></td>
        </tr>

        <tr valign="middle">
         <td align="left"><i>+</i></td>
         <td align="left">equivalent to <i>{1,}</i></td>
        </tr>

        <tr valign="middle">
         <td align="left"><i>?</i></td>
         <td align="left">equivalent to <i>{0,1}</i></td>
        </tr>

       </tbody>
     
     </table>
<p>
    </p>
    <p class="para">
     It is possible to construct infinite loops  by  following  a
     subpattern  that  can  match no characters with a quantifier
     that has no upper limit, for example:

       <i>(a?)*</i>
    </p>
    <p class="para">
     Earlier versions of Perl and PCRE used to give an  error  at
     compile  time  for such patterns. However, because there are
     cases where this  can  be  useful,  such  patterns  are  now
     accepted,  but  if  any repetition of the subpattern does in
     fact match no characters, the loop is forcibly broken.
    </p>
    <p class="para">
     By default, the quantifiers  are  &quot;greedy&quot;,  that  is,  they
     match  as much as possible (up to the maximum number of permitted
     times), without causing the rest of  the  pattern  to
     fail. The classic example of where this gives problems is in
     trying to match comments in C programs. These appear between
     the  sequences /* and */ and within the sequence, individual
     * and / characters may appear. An attempt to  match  C  comments
     by applying the pattern

       <i>/\*.*\*/</i>

     to the string

       <i>/* first comment */  not comment  /* second comment */</i>

     fails, because it matches  the  entire  string  due  to  the
     greediness of the .*  item.
    </p>
    <p class="para">
     However, if a quantifier is followed  by  a  question  mark,
     then it ceases to be greedy, and instead matches the minimum
     number of times possible, so the pattern

       <i>/\*.*?\*/</i>

     does the right thing with the C comments. The meaning of the
     various  quantifiers is not otherwise changed, just the preferred
     number of matches.  Do not confuse this use of
     question  mark  with  its  use as a quantifier in its own right.
     Because it has two uses, it can sometimes appear doubled, as
     in

       <i>\d??\d</i>

     which matches one digit by preference, but can match two  if
     that is the only way the rest of the pattern matches.
    </p>
    <p class="para">
     If the <a href="reference.pcre.pattern.modifiers.php" class="link">PCRE_UNGREEDY</a> 
     option is set (an option which  is  not
     available  in  Perl)  then the quantifiers are not greedy by
     default, but individual ones can be made greedy by following
     them  with  a  question mark. In other words, it inverts the
     default behaviour.
    </p>
    <p class="para">
     Quantifiers followed by <i>+</i> are &quot;possessive&quot;. They eat
     as many characters as possible and don&#039;t return to match the rest of the
     pattern. Thus <i>.*abc</i> matches &quot;aabc&quot; but
     <i>.*+abc</i> doesn&#039;t because <i>.*+</i> eats the
     whole string. Possessive quantifiers can be used to speed up processing
     since PHP 4.3.3.
    </p>
    <p class="para">
     When a parenthesized subpattern is quantified with a minimum
     repeat  count  that is greater than 1 or with a limited maximum,
     more store is required for the  compiled  pattern,  in
     proportion to the size of the minimum or maximum.
    </p>
    <p class="para">
     If a pattern starts with .* or  .{0,}  and  the  <a href="reference.pcre.pattern.modifiers.php" class="link">PCRE_DOTALL</a>
     option (equivalent to Perl&#039;s /s) is set, thus allowing the .
     to match newlines, then the pattern is implicitly  anchored,
     because whatever follows will be tried against every character
     position in the subject string, so there is no point  in
     retrying  the overall match at any position after the first.
     PCRE treats such a pattern as though it were preceded by \A.
     In  cases where it is known that the subject string contains
     no newlines, it is worth setting <a href="reference.pcre.pattern.modifiers.php" class="link">PCRE_DOTALL</a>  when  the 
     pattern begins with .* in order to
     obtain this optimization, or
     alternatively using ^ to indicate anchoring explicitly.
    </p>
    <p class="para">
     When a capturing subpattern is repeated, the value  captured
     is the substring that matched the final iteration. For example, after

       <i>(tweedle[dume]{3}\s*)+</i>

     has matched &quot;tweedledum tweedledee&quot; the value  of  the  captured
     substring  is  &quot;tweedledee&quot;.  However,  if  there are
     nested capturing  subpatterns,  the  corresponding  captured
     values  may  have been set in previous iterations. For example,
     after
    
       <i>/(a|(b))+/</i>

     matches &quot;aba&quot; the value of the second captured substring  is
     &quot;b&quot;.
     </p>
    </div><?php manual_footer(); ?>
 
show source | credits | sitemap | contact | advertising | mirror sites