Reputation: 361
Im stuck here. Not sure why my reg ex won't work. I have a pipe delimited text file with a series of columns. I need to extract the 3rd column.
A|B|C|D|E|F|G|H|I
2011-03-03 00:00:00.0|1|60510271|254735|27751|BBB|1|-0.1619023623|-0.009865904
2011-03-03 00:00:00.0|1|60510270|254735|27751|B|3|-0.0064786612|-0.0063739185
2011-03-03 00:00:00.0|1|60510269|254735|27751|B|3|-0.0084998226|-0.009244384
$> head foo | perl -pi -e 's/^(.*)\|(.*)\|(.*)\|(.*)$/$3/g'
-0.1619023623
-0.0064786612
-0.0084998226
Clearly not the correct column being outputted.
Thoughts ?
Upvotes: 1
Views: 3030
Reputation: 1
(?<=\|)\d{8}
Maybe this would work (?<=\|)
positive look behind for a |
character followed by 8 digits
Upvotes: 0
Reputation: 11
First thought was Text::CSV (mentioned by Matt B), but if the data looks like the example I'd say split is the right choise.
Untested:
$> head foo | perl -le 'while (<>) { print (split m{|})[2]; }'
If you really want a regex I would use something like this:
s{^ [^\|]* \| [^\|]* \| ([^\|]*) \| .*$}{$1}gx;
Upvotes: 1
Reputation: 25599
Normally, its easier/simpler(KISS) NOT to use regex for file format that have structured delimiters. Just split the string on "|" delimiter and get the 3rd field.
awk -F"|" '{print $3}' file
With Ruby(1.9+)
ruby -F"\|" -ane 'puts $F[2]' file
With Perl, its similar to the above Ruby one-liner.
perl -F"\|" -ane 'print $F[2]."\n"' file
Upvotes: 4
Reputation: 58521
You need to make your pattern greedy - so:
's/^(.*?)\|(.*?)\|(.*?)\|(.*)$/$3/g'
Upvotes: 1
Reputation: 359776
How about using a real parser instead of hacking together a regex? Text::CSV
should do the job.
my $csv = Text::CSV->new({sep_char => "|"});
Upvotes: 1
Reputation: 19971
.*
will by default match as much as it can, so your RE is picking out the last three columns (and everything before) rather than the first three (and everything after). You can avoid this in (at least) two ways: (1) instead of .*
, look for [^|]*
, or (2) make your repetition operators non-greedy: .*?
instead of .*
.
(Or you could explicitly split the string instead of matching the whole thing with a single RE. You might want to try both approaches and see which performs better, if it matters. Splitting is likely to give longer but clearer code.)
Upvotes: 1