Reputation: 13588

sed: print only matching group

I want to grab the last two numbers (one int, one float; followed by optional whitespace) and print only them.

Example:

foo bar <foo> bla 1 2 3.4

Should print:

2 3.4

So far, I have the following:

sed -n  's/\([0-9][0-9]*[\ \t][0-9.]*[\ \t]*$\)/replacement/p'

will give me

foo bar <foo> bla 1 replacement

However, if I try to replace it with group 1, the whole line is printed.

sed -n  's/\([0-9][0-9]*[\ \t][0-9.]*[\ \t]*$\)/\1/p'

How can I print only the section of the line that matches the regex in the group?

Upvotes: 201

Answers (5)

Bruno Bronosky

Reputation: 70339

I agree with @kent that this is well suited for grep -o. If you need to extract a group within a pattern, you can do it with a 2nd grep.

# To extract \1 from /xx([0-9]+)yy/
$ echo "aa678bb xx123yy xx4yy aa42 aa9bb" | grep -Eo 'xx[0-9]+yy' | grep -Eo '[0-9]+'
123
4

# To extract \1 from /a([0-9]+)b/
$ echo "aa678bb xx123yy xx4yy aa42 aa9bb" | grep -Eo 'a[0-9]+b' | grep -Eo '[0-9]+'
678
9

I generally cringe when I see 2 calls to grep/sed/awk piped together, but it's not always wrong. While we should exercise our skills of doing things efficiently, "A foolish consistency is the hobgoblin of little minds", and "Real artists ship".

Upvotes: 8

carlin.scott

Reputation: 7265

The cut command is designed for this exact situation. It will "cut" on any delimiter and then you can specify which chunks should be output.

For instance: echo "foo bar <foo> bla 1 2 3.4" | cut -d " " -f 6-7

Will result in output of: 2 3.4

-d sets the delimiter

-f selects the range of 'fields' to output, in this case, it's the 6th through 7th chunks of the original string. You can also specify the range as a list, such as 6,7.

Upvotes: 8

chooban

Reputation: 9256

And for yet another option, I'd go with awk!

echo "foo bar <foo> bla 1 2 3.4" | awk '{ print $(NF-1), $NF; }'

This will split the input (I'm using STDIN here, but your input could easily be a file) on spaces, and then print out the last-but-one field, and then the last field. The $NF variables hold the number of fields found after exploding on spaces.

The benefit of this is that it doesn't matter if what precedes the last two fields changes, as long as you only ever want the last two it'll continue to work.

Upvotes: 13

Kent

Reputation: 195059

grep is the right tool for extracting.

using your example and your regex:

kent$  echo 'foo bar <foo> bla 1 2 3.4'|grep -o '[0-9][0-9]*[\ \t][0-9.]*[\ \t]*$'
2 3.4

Upvotes: 96

iruvar

Reputation: 23364

Match the whole line, so add a .* at the beginning of your regex. This causes the entire line to be replaced with the contents of the group

echo "foo bar <foo> bla 1 2 3.4" |
 sed -n  's/.*\([0-9][0-9]*[\ \t][0-9.]*[ \t]*$\)/\1/p'
2 3.4

Upvotes: 212

sed: print only matching group

Answers (5)

Related Questions