downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

Our source is open

The syntax highlighted source is automatically generated by PHP from the plaintext script. If you're interested in what's behind the several functions we used, you can always take a look at the source of the following files:

Of course, if you want to see the source of this page, we have it available. You can also browse the SVN repository for this website on svn.php.net.

Source of: /manual/en/regexp.reference.performances.php

<?php
include_once $_SERVER['DOCUMENT_ROOT'] . '/include/shared-manual.inc';
$TOC = array();
$PARENTS = array();
include_once
dirname(__FILE__) ."/toc/reference.pcre.pattern.syntax.inc";
$setup = array (
 
'home' =>
  array (
   
0 => 'index.php',
   
1 => 'PHP Manual',
  ),
 
'head' =>
  array (
   
0 => 'UTF-8',
   
1 => 'en',
  ),
 
'this' =>
  array (
   
0 => 'regexp.reference.performances.php',
   
1 => 'Performances',
  ),
 
'up' =>
  array (
   
0 => 'reference.pcre.pattern.syntax.php',
   
1 => 'Pattern Syntax',
  ),
 
'prev' =>
  array (
   
0 => 'regexp.reference.recursive.php',
   
1 => 'Recursive patterns',
  ),
 
'next' =>
  array (
   
0 => 'ref.pcre.php',
   
1 => 'PCRE Functions',
  ),
);
$setup["toc"] = $TOC;
$setup["parents"] = $PARENTS;
manual_setup($setup);

manual_header();
?>
<div id="regexp.reference.performances" class="section">
     <h2 class="title">Performances</h2>
     <p class="para">
     Certain items that may appear in patterns are more efficient
     than  others.  It is more efficient to use a character class
     like [aeiou] than a set of alternatives such as (a|e|i|o|u).
     In  general,  the  simplest  construction  that provides the
     required behaviour is usually the  most  efficient.  Jeffrey
     Friedl&#039;s  book contains a lot of discussion about optimizing
     regular expressions for efficient performance.
    </p>
    <p class="para">
     When a pattern begins with .* and the <a href="reference.pcre.pattern.modifiers.php" class="link">PCRE_DOTALL</a>  option  is
     set,  the  pattern  is implicitly anchored by PCRE, since it
     can match only at the start of a subject string. However, if
     <a href="reference.pcre.pattern.modifiers.php" class="link">PCRE_DOTALL</a>  
     is not set, PCRE cannot make this optimization,
     because the . metacharacter does not then match  a  newline,
     and if the subject string contains newlines, the pattern may
     match from the character immediately following one  of  them
     instead of from the very start. For example, the pattern

       <i>(.*) second</i>

     matches the subject &quot;first\nand second&quot; (where \n stands for
     a newline character) with the first captured substring being
     &quot;and&quot;. In order to do this, PCRE  has  to  retry  the  match
     starting after every newline in the subject.
    </p>
    <p class="para">
     If you are using such a pattern with subject strings that do
     not  contain  newlines,  the best performance is obtained by
     setting <a href="reference.pcre.pattern.modifiers.php" class="link">PCRE_DOTALL</a>,
     or starting the  pattern  with  ^.*  to
     indicate  explicit anchoring. That saves PCRE from having to
     scan along the subject looking for a newline to restart at.
    </p>
    <p class="para">
     Beware of patterns that contain nested  indefinite  repeats.
     These  can  take a long time to run when applied to a string
     that does not match. Consider the pattern fragment

       <i>(a+)*</i>
    </p>
    <p class="para">
     This can match &quot;aaaa&quot; in 33 different ways, and this  number
     increases  very  rapidly  as  the string gets longer. (The *
     repeat can match 0, 1, 2, 3, or 4 times,  and  for  each  of
     those  cases other than 0, the + repeats can match different
     numbers of times.) When the remainder of the pattern is such
     that  the entire match is going to fail, PCRE has in principle
     to try every possible variation, and this  can  take  an
     extremely long time.
    </p>
    <p class="para">
     An optimization catches some of the more simple  cases  such
     as

       <i>(a+)*b</i>

     where a literal character follows. Before embarking  on  the
     standard matching procedure, PCRE checks that there is a &quot;b&quot;
     later in the subject string, and if there is not,  it  fails
     the  match  immediately. However, when there is no following
     literal this optimization cannot be used. You  can  see  the
     difference by comparing the behaviour of

       <i>(a+)*\d</i>

     with the pattern above. The former gives  a  failure  almost
     instantly  when  applied  to a whole line of &quot;a&quot; characters,
     whereas the latter takes an appreciable  time  with  strings
     longer than about 20 characters.
     </p>
    </div><?php manual_footer(); ?>
 
show source | credits | sitemap | contact | advertising | mirror sites