Reputation: 540
I am searching for a particular pattern in a csv file. I would like to print the value of the second-to-last column if its value matches [0-9]{5}
.
For example, let's say I have file.csv
containing only one line of text:
col1,col2,col3,12345,col5
So I'm trying to print 12345
. Here is the command I tried:
sed -nr 's/,([0-9]{5}),[^,]*$/\1/p' file.csv
However, this prints col1,col2,col312345
.
Then, I tried
sed -nr 's/.*,([0-9]{5}),[^,]*$/\1/p' file.csv
which worked perfectly, printing 12345
.
I don't know if I'm misunderstanding sed
or just regex in general, but when I test the first regex on www.regex101.com, it behaves as I originally expected it to.
Why did prepending a .*
to the pattern make a difference / fix the problem, and also why did the first pattern print what it did?
Upvotes: 0
Views: 358
Reputation: 52122
The command s/pattern/replacement/p
takes a line that matches pattern
, performs the substitution and then prints the whole line.1 So, you have this line:
col1,col2,col3,12345,col5
Your pattern /,([0-9]{5}),[^,]*$/
matches the line, specifically ,12345,col5
. You substitute that with the capture group, 12345
, so the line is now
col1,col2,col312345
and the p
flag prints the whole line.
In your second command, the pattern /.*,([0-9]{5}),[^,]*$/
matches the line as well, but this time, it matches the whole line, and you substitute the whole line with the capture group.
1 In sed parlance, the line is loaded into the "pattern space", and you're manipulating the pattern space. At the end of each cycle, the pattern space gets printed (or whenever an explicit p
command is given). I think you assumed that the p
flag in the s
command affects only the substituted part, but it's the whole pattern space.
Upvotes: 2