Elliot
Elliot

Reputation: 5541

Regex Crash on pattern

If I try to match data of the form

6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.002304, 0.000267, 1.0, 9.549297, 12.604, 12.258, 0.714172
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000829, 0.00014, 2.0, 19.098593, 24.036, 23.266, 2.723789
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000369, 9.5e-05, 3.0, 28.64789, 35.49, 34.25, 6.032778
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000207, 7.4e-05, 4.0, 38.197186, 45.535, 43.987, 10.320451
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000164, 6.1e-05, 5.0, 47.746483, 55.276, 53.18, 15.660281
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000144, 5.3e-05, 6.0, 57.29578, 64.029, 61.729, 21.767831
6.0, 10.64, 5.23, 6.66, 0.81, 30, 9.2e-05, 4.6e-05, 7.0, 66.845076, 74.073, 71.162, 29.379847
6.0, 10.64, 5.23, 6.66, 0.81, 30, 7.7e-05, 4.1e-05, 8.0, 76.394373, 83.119, 79.763, 37.677382
6.0, 10.64, 5.23, 6.66, 0.81, 30, 6.4e-05, 3.7e-05, 9.0, 85.943669, 92.484, 88.643, 47.162835
6.0, 10.64, 5.23, 6.66, 0.81, 30, 5.2e-05, 3.3e-05, 10.0, 95.492966, 102.025, 97.861, 57.808909
6.0, 10.64, 5.23, 6.66, 0.81, 30, 3.1e-05, 2.4e-05, 15.0, 143.239449, 144.605, 138.215, 122.904018
6.0, 10.64, 5.23, 6.66, 0.81, 30, 1.6e-05, 1.8e-05, 20.0, 190.985932, 189.013, 179.673, 214.196754
6.0, 10.64, 5.23, 6.66, 0.81, 30, 1e-05, 1.5e-05, 25.0, 238.732415, 231.256, 219.497, 327.58412

line by line (via findall) using 13 instances of ([-]?[\.\d]*[eE]?[-]?[\.\d]*), <-note ,+space at the end except for the last one

([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*)

regex locks up or crashes. If I try to match 12 iterations, it works fine. I don't understand why matching 12 numbers is ok but matching 13 is instant death. Anyone know what is going on here? Note that while the data set here doesn't happen to have scientific notation in all columns it can happen, hence why I put in a match for all the columns.

Upvotes: 0

Views: 318

Answers (2)

user557597
user557597

Reputation:

Apparently the problem is catastropic backtracking. You are making everything optional.
It can all be optional if specific anchors are introduced.

This is a sample regex that shows how to use an ALL OPTIONAL form
Both regexes use Multi-Line mode option

 #--------------------------------
 # Multiple numbers, single line
 # (?i)(?:(?:^|\h*,\h*)(?=[^e\s,]*\d)[+-]?\d*\.?\d*(?:e[+-]?\d+)?(?:$|(?=[\h,])))+
 #--------------------------------
 (?i)                        # Case insensitive modifier
 (?:
      (?: ^ | \h* , \h* )    # Beginning of string or horizontal whitespace and comma
      (?= [^e\s,]* \d )      # Lookahead must be a digit (and before exponent or whitespace or comma)
      [+-]? \d* \.? \d*      # Consume correct numeric form 
      (?: e [+-]? \d+ )?     # Consume correct exponent form
      (?:                    # End of string or horizontal whitespace or comma ahead
           $ 
        |  (?= [\h,] )
      )
 )+

 #-------------------
 # Single number
 # (?i)(?:^|(?<=\h))(?=[^e\s,]*\d)[+-]?\d*\.?\d*(?:e[+-]?\d+)?(?:$|(?=[\h,]))
 #-------------------
 (?i)                        # Case insensitive modifier
 (?:                         # Beginning of string or horizontal whitespace behind
      ^ 
   |  (?<= \h )
 )
 (?= [^e\s,]* \d )           # Lookahead must be a digit (and before exponent or whitespace or comma)
 [+-]? \d* \.? \d*           # Consume correct numeric form 
 (?: e [+-]? \d+ )?          # Consume correct exponent form
 (?:                         # End of string or horizontal whitespace or comma ahead
      $ 
   |  (?= [\h,] )
 )

Perl test case

$/ = undef;

$str = <DATA>;

while ( $str =~ /(?i)(?:(?:^|\h*,\h*)(?=[^e\s,]*\d)[+-]?\d*\.?\d*(?:e[+-]?\d+)?(?:$|(?=[\h,])))+/mg)
{
    print "Matched  '$&'\n";
}


__DATA__

6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.002304, 0.000267, 1.0, 9.549297, 12.604, 12.258, 0.714172
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000829, 0.00014, 2.0, 19.098593, 24.036, 23.266, 2.723789
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000369, 9.5e-05, 3.0, 28.64789, 35.49, 34.25, 6.032778
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000207, 7.4e-05, 4.0, 38.197186, 45.535, 43.987, 10.320451
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000164, 6.1e-05, 5.0, 47.746483, 55.276, 53.18, 15.660281
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000144, 5.3e-05, 6.0, 57.29578, 64.029, 61.729, 21.767831
6.0, 10.64, 5.23, 6.66, 0.81, 30, 9.2e-05, 4.6e-05, 7.0, 66.845076, 74.073, 71.162, 29.379847
6.0, 10.64, 5.23, 6.66, 0.81, 30, 7.7e-05, 4.1e-05, 8.0, 76.394373, 83.119, 79.763, 37.677382
6.0, 10.64, 5.23, 6.66, 0.81, 30, 6.4e-05, 3.7e-05, 9.0, 85.943669, 92.484, 88.643, 47.162835
6.0, 10.64, 5.23, 6.66, 0.81, 30, 5.2e-05, 3.3e-05, 10.0, 95.492966, 102.025, 97.861, 57.808909
6.0, 10.64, 5.23, 6.66, 0.81, 30, 3.1e-05, 2.4e-05, 15.0, 143.239449, 144.605, 138.215, 122.904018
6.0, 10.64, 5.23, 6.66, 0.81, 30, 1.6e-05, 1.8e-05, 20.0, 190.985932, 189.013, 179.673, 214.196754
6.0, 10.64, 5.23, 6.66, 0.81, 30, 1e-05, 1.5e-05, 25.0, 238.732415, 231.256, 219.497, 327.58412

Output >>

Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.002304, 0.000267, 1.0, 9.549297, 12.604, 12.258, 0.714172'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000829, 0.00014, 2.0, 19.098593, 24.036, 23.266, 2.723789'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000369, 9.5e-05, 3.0, 28.64789, 35.49, 34.25, 6.032778'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000207, 7.4e-05, 4.0, 38.197186, 45.535, 43.987, 10.320451'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000164, 6.1e-05, 5.0, 47.746483, 55.276, 53.18, 15.660281'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000144, 5.3e-05, 6.0, 57.29578, 64.029, 61.729, 21.767831'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 9.2e-05, 4.6e-05, 7.0, 66.845076, 74.073, 71.162, 29.379847'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 7.7e-05, 4.1e-05, 8.0, 76.394373, 83.119, 79.763, 37.677382'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 6.4e-05, 3.7e-05, 9.0, 85.943669, 92.484, 88.643, 47.162835'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 5.2e-05, 3.3e-05, 10.0, 95.492966, 102.025, 97.861, 57.808909'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 3.1e-05, 2.4e-05, 15.0, 143.239449,144.605, 138.215, 122.904018'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 1.6e-05, 1.8e-05, 20.0, 190.985932,189.013, 179.673, 214.196754'
Matched  '6.0, 10.64, 5.23, 6.66, 0.81, 30, 1e-05, 1.5e-05, 25.0, 238.732415, 231.256, 219.497, 327.58412'

Upvotes: 0

tenub
tenub

Reputation: 3446

Try this and report back:

^(?:-?(?:\d+\.)?\d+(?:[eE]-?\d+)?(?:,\s*|$)){13}

Upvotes: 0

Related Questions