Reputation: 254
I am trying to write a regex that will get me the contents of the 21st field in this list for the lines that starts with an I, provided that the field contains a number in this format nnn-nnnnnn (like 001-123456):
T|112|| | | |AZ |D |1 | 1|
I| 10|ACAA |BY CORD EACH | 10.00-| .99 | | .36 |1 | 1|D |I|CO |BTE |N| | .00 | .00 |15 |1 |001-123456 |ACAA
I| 20|LEES03 |TINTED OZ | 2.00-| 6.50 | | 4.48 |1 | 1|D |I|FL |LTGE |N| | .00 | .00 |45 |1 |001-234555 |JEE
I| 20|LEES03 |TINTED OZ | 2.00-| 6.50 | | 4.48 |1 | 1|D |I|FL |LTGE |N| | .00 | .00 |45 |1 | |JEE
I| 20|LEES03 |TINTED OZ | 2.00-| 6.50 | | 4.48 |1 | 1|D |I|FL |LTGE |N| | .00 | .00 |45 |1 |001-234552 |JEE
Here is the simple regex that I am using, there I am capturing the field content in the 2nd capture group:
^I(\|.*?){20}(\d{3}-\d{6})
I have read about catastrophic backtracking, but my regex skills are limited and I do not understand how to write this regex so that I do not get the catastrophic backtracking.
Help would be appreciated.
Upvotes: 3
Views: 247
Reputation: 89547
IMO, a better way consists to split the string on pipes and then to check the first and the 21th fields. An example in command line with the autosplit parameter -a
:
perl -F'\|' -anE'say $& if $F[0] eq "I" && $F[20]=~/\S+/' file
Example in a script:
use strict;
use warnings;
use feature qw(say);
my @F;
while(<DATA>) {
@F = split /\|/;
say $1 if $F[0] eq 'I' && $F[20] =~ /(\d+-\d+)/
}
__DATA__
T|112|| | | |AZ |D |1 | 1|
I| 10|ACAA |BY CORD EACH | 10.00-| .99 | | .36 |1 | 1|D |I|CO |BTE |N| | .00 | .00 |15 |1 |001-123456 |ACAA
I| 20|LEES03 |TINTED OZ | 2.00-| 6.50 | | 4.48 |1 | 1|D |I|FL |LTGE |N| | .00 | .00 |45 |1 |001-234555 |JEE
I| 20|LEES03 |TINTED OZ | 2.00-| 6.50 | | 4.48 |1 | 1|D |I|FL |LTGE |N| | .00 | .00 |45 |1 | |JEE
I| 20|LEES03 |TINTED OZ | 2.00-| 6.50 | | 4.48 |1 | 1|D |I|FL |LTGE |N| | .00 | .00 |45 |1 |001-234552 |JEE
Upvotes: 5
Reputation: 784938
You can avoid catastrophic backtracking by using negation pattern:
^I(?:\|[^|]*){20}(\d{3}-\d{6})
[^|]*
matches 0 or more character that are not |
Upvotes: 5