Captain_Obvious
Captain_Obvious

Reputation: 540

sed not printing matching group as expected

I am searching for a particular pattern in a csv file. I would like to print the value of the second-to-last column if its value matches [0-9]{5}.

For example, let's say I have file.csv containing only one line of text:

col1,col2,col3,12345,col5

So I'm trying to print 12345. Here is the command I tried:

sed -nr 's/,([0-9]{5}),[^,]*$/\1/p' file.csv

However, this prints col1,col2,col312345.

Then, I tried

sed -nr 's/.*,([0-9]{5}),[^,]*$/\1/p' file.csv

which worked perfectly, printing 12345.

I don't know if I'm misunderstanding sed or just regex in general, but when I test the first regex on www.regex101.com, it behaves as I originally expected it to.

Why did prepending a .* to the pattern make a difference / fix the problem, and also why did the first pattern print what it did?

Upvotes: 0

Views: 358

Answers (1)

Benjamin W.
Benjamin W.

Reputation: 52122

The command s/pattern/replacement/p takes a line that matches pattern, performs the substitution and then prints the whole line.1 So, you have this line:

col1,col2,col3,12345,col5

Your pattern /,([0-9]{5}),[^,]*$/ matches the line, specifically ,12345,col5. You substitute that with the capture group, 12345, so the line is now

col1,col2,col312345

and the p flag prints the whole line.

In your second command, the pattern /.*,([0-9]{5}),[^,]*$/ matches the line as well, but this time, it matches the whole line, and you substitute the whole line with the capture group.


1 In sed parlance, the line is loaded into the "pattern space", and you're manipulating the pattern space. At the end of each cycle, the pattern space gets printed (or whenever an explicit p command is given). I think you assumed that the p flag in the s command affects only the substituted part, but it's the whole pattern space.

Upvotes: 2

Related Questions