RanRag
RanRag

Reputation: 49567

How to extract text from a string using sed?

My example string is as follows:

This is 02G05 a test string 20-Jul-2012

Now from the above string I want to extract 02G05. For that I tried the following regex with sed

$ echo "This is 02G05 a test string 20-Jul-2012" | sed -n '/\d+G\d+/p'

But the above command prints nothing and the reason I believe is it is not able to match anything against the pattern I supplied to sed.

So, my question is what am I doing wrong here and how to correct it.

When I try the above string and pattern with python I get my result

>>> re.findall(r'\d+G\d+',st)
['02G05']
>>>

Upvotes: 130

Views: 307168

Answers (7)

Gauthier
Gauthier

Reputation: 41945

I know the question asks with sed, but since it is tagged bash, I want to point out that you don't need grep or sed:

#!/bin/env bash

str="This is 02G05 a test string 20-Jul-2012"
regex="([0-9]+)G([0-9]+)"

if [[ "$str" =~ $regex ]]
then
    echo ${BASH_REMATCH[0]}
    echo ${BASH_REMATCH[1]}
    echo ${BASH_REMATCH[2]}
fi

bash has its own regex matching, it also support groups.

Result:

02G05
02
05

See this answer for more details.

Upvotes: 1

tripleee
tripleee

Reputation: 189387

The pattern \d might not be supported by your sed. Try [0-9] or [[:digit:]] instead.

To only print the actual match (not the entire matching line), use a substitution.

sed -n 's/.*\([0-9][0-9]*G[0-9][0-9]*\).*/\1/p'

The parentheses capture the text they match into a back reference. Here, the first (and only) parentheses capture the string we want to keep, and we replace the entire line with just the captured string \1, and print the resulting line. (The p option says to print the resulting line after performing a successful substitution, and the -n option prevents sed from performing its normal printing of every other line.)

Upvotes: 135

aotherix
aotherix

Reputation: 111

We can use sed -En to simplify the regular expression, where:

n: suppress automatic printing of pattern space
E: use extended regular expressions in the script
$ echo "This is 02G05 a test string 20-Jul-2012" | sed -En 's/.*([0-9][0-9]+G[0-9]+).*/\1/p'

02G05

Upvotes: 1

mVChr
mVChr

Reputation: 50185

How about using grep -E?

echo "This is 02G05 a test string 20-Jul-2012" | grep -Eo '[0-9]+G[0-9]+'

Upvotes: 139

Tim Savannah
Tim Savannah

Reputation: 19

Try using rextract. It will let you extract text using a regular expression and reformat it.

Example:

$ echo "This is 02G05 a test string 20-Jul-2012" | ./rextract '([\d]+G[\d]+)' '${1}'

2G05

Upvotes: 0

Zsolt Botykai
Zsolt Botykai

Reputation: 51603

Try this instead:

echo "This is 02G05 a test string 20-Jul-2012" | sed 's/.* \([0-9]\+G[0-9]\+\) .*/\1/'

But note, if there is two pattern on one line, it will prints the 2nd.

Upvotes: 8

Dennis Williamson
Dennis Williamson

Reputation: 360085

sed doesn't recognize \d, use [[:digit:]] instead. You will also need to escape the + or use the -r switch (-E on OS X).

Note that [0-9] works as well for Arabic-Hindu numerals.

Upvotes: 6

Related Questions