Reputation: 5541
If I try to match data of the form
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.002304, 0.000267, 1.0, 9.549297, 12.604, 12.258, 0.714172
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000829, 0.00014, 2.0, 19.098593, 24.036, 23.266, 2.723789
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000369, 9.5e-05, 3.0, 28.64789, 35.49, 34.25, 6.032778
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000207, 7.4e-05, 4.0, 38.197186, 45.535, 43.987, 10.320451
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000164, 6.1e-05, 5.0, 47.746483, 55.276, 53.18, 15.660281
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000144, 5.3e-05, 6.0, 57.29578, 64.029, 61.729, 21.767831
6.0, 10.64, 5.23, 6.66, 0.81, 30, 9.2e-05, 4.6e-05, 7.0, 66.845076, 74.073, 71.162, 29.379847
6.0, 10.64, 5.23, 6.66, 0.81, 30, 7.7e-05, 4.1e-05, 8.0, 76.394373, 83.119, 79.763, 37.677382
6.0, 10.64, 5.23, 6.66, 0.81, 30, 6.4e-05, 3.7e-05, 9.0, 85.943669, 92.484, 88.643, 47.162835
6.0, 10.64, 5.23, 6.66, 0.81, 30, 5.2e-05, 3.3e-05, 10.0, 95.492966, 102.025, 97.861, 57.808909
6.0, 10.64, 5.23, 6.66, 0.81, 30, 3.1e-05, 2.4e-05, 15.0, 143.239449, 144.605, 138.215, 122.904018
6.0, 10.64, 5.23, 6.66, 0.81, 30, 1.6e-05, 1.8e-05, 20.0, 190.985932, 189.013, 179.673, 214.196754
6.0, 10.64, 5.23, 6.66, 0.81, 30, 1e-05, 1.5e-05, 25.0, 238.732415, 231.256, 219.497, 327.58412
line by line (via findall) using 13 instances of ([-]?[\.\d]*[eE]?[-]?[\.\d]*),
<-note ,+space at the end except for the last one
([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*), ([-]?[\.\d]*[eE]?[-]?[\.\d]*)
regex locks up or crashes. If I try to match 12 iterations, it works fine. I don't understand why matching 12 numbers is ok but matching 13 is instant death. Anyone know what is going on here? Note that while the data set here doesn't happen to have scientific notation in all columns it can happen, hence why I put in a match for all the columns.
Upvotes: 0
Views: 318
Reputation:
Apparently the problem is catastropic backtracking. You are making everything optional.
It can all be optional if specific anchors are introduced.
This is a sample regex that shows how to use an ALL OPTIONAL form
Both regexes use Multi-Line mode option
#--------------------------------
# Multiple numbers, single line
# (?i)(?:(?:^|\h*,\h*)(?=[^e\s,]*\d)[+-]?\d*\.?\d*(?:e[+-]?\d+)?(?:$|(?=[\h,])))+
#--------------------------------
(?i) # Case insensitive modifier
(?:
(?: ^ | \h* , \h* ) # Beginning of string or horizontal whitespace and comma
(?= [^e\s,]* \d ) # Lookahead must be a digit (and before exponent or whitespace or comma)
[+-]? \d* \.? \d* # Consume correct numeric form
(?: e [+-]? \d+ )? # Consume correct exponent form
(?: # End of string or horizontal whitespace or comma ahead
$
| (?= [\h,] )
)
)+
#-------------------
# Single number
# (?i)(?:^|(?<=\h))(?=[^e\s,]*\d)[+-]?\d*\.?\d*(?:e[+-]?\d+)?(?:$|(?=[\h,]))
#-------------------
(?i) # Case insensitive modifier
(?: # Beginning of string or horizontal whitespace behind
^
| (?<= \h )
)
(?= [^e\s,]* \d ) # Lookahead must be a digit (and before exponent or whitespace or comma)
[+-]? \d* \.? \d* # Consume correct numeric form
(?: e [+-]? \d+ )? # Consume correct exponent form
(?: # End of string or horizontal whitespace or comma ahead
$
| (?= [\h,] )
)
Perl test case
$/ = undef;
$str = <DATA>;
while ( $str =~ /(?i)(?:(?:^|\h*,\h*)(?=[^e\s,]*\d)[+-]?\d*\.?\d*(?:e[+-]?\d+)?(?:$|(?=[\h,])))+/mg)
{
print "Matched '$&'\n";
}
__DATA__
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.002304, 0.000267, 1.0, 9.549297, 12.604, 12.258, 0.714172
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000829, 0.00014, 2.0, 19.098593, 24.036, 23.266, 2.723789
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000369, 9.5e-05, 3.0, 28.64789, 35.49, 34.25, 6.032778
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000207, 7.4e-05, 4.0, 38.197186, 45.535, 43.987, 10.320451
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000164, 6.1e-05, 5.0, 47.746483, 55.276, 53.18, 15.660281
6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000144, 5.3e-05, 6.0, 57.29578, 64.029, 61.729, 21.767831
6.0, 10.64, 5.23, 6.66, 0.81, 30, 9.2e-05, 4.6e-05, 7.0, 66.845076, 74.073, 71.162, 29.379847
6.0, 10.64, 5.23, 6.66, 0.81, 30, 7.7e-05, 4.1e-05, 8.0, 76.394373, 83.119, 79.763, 37.677382
6.0, 10.64, 5.23, 6.66, 0.81, 30, 6.4e-05, 3.7e-05, 9.0, 85.943669, 92.484, 88.643, 47.162835
6.0, 10.64, 5.23, 6.66, 0.81, 30, 5.2e-05, 3.3e-05, 10.0, 95.492966, 102.025, 97.861, 57.808909
6.0, 10.64, 5.23, 6.66, 0.81, 30, 3.1e-05, 2.4e-05, 15.0, 143.239449, 144.605, 138.215, 122.904018
6.0, 10.64, 5.23, 6.66, 0.81, 30, 1.6e-05, 1.8e-05, 20.0, 190.985932, 189.013, 179.673, 214.196754
6.0, 10.64, 5.23, 6.66, 0.81, 30, 1e-05, 1.5e-05, 25.0, 238.732415, 231.256, 219.497, 327.58412
Output >>
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.002304, 0.000267, 1.0, 9.549297, 12.604, 12.258, 0.714172'
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000829, 0.00014, 2.0, 19.098593, 24.036, 23.266, 2.723789'
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000369, 9.5e-05, 3.0, 28.64789, 35.49, 34.25, 6.032778'
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000207, 7.4e-05, 4.0, 38.197186, 45.535, 43.987, 10.320451'
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000164, 6.1e-05, 5.0, 47.746483, 55.276, 53.18, 15.660281'
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 0.000144, 5.3e-05, 6.0, 57.29578, 64.029, 61.729, 21.767831'
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 9.2e-05, 4.6e-05, 7.0, 66.845076, 74.073, 71.162, 29.379847'
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 7.7e-05, 4.1e-05, 8.0, 76.394373, 83.119, 79.763, 37.677382'
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 6.4e-05, 3.7e-05, 9.0, 85.943669, 92.484, 88.643, 47.162835'
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 5.2e-05, 3.3e-05, 10.0, 95.492966, 102.025, 97.861, 57.808909'
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 3.1e-05, 2.4e-05, 15.0, 143.239449,144.605, 138.215, 122.904018'
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 1.6e-05, 1.8e-05, 20.0, 190.985932,189.013, 179.673, 214.196754'
Matched '6.0, 10.64, 5.23, 6.66, 0.81, 30, 1e-05, 1.5e-05, 25.0, 238.732415, 231.256, 219.497, 327.58412'
Upvotes: 0
Reputation: 3446
Try this and report back:
^(?:-?(?:\d+\.)?\d+(?:[eE]-?\d+)?(?:,\s*|$)){13}
Upvotes: 0