Uthman
Uthman

Reputation: 9807

sed extracting group of digits

I have tried to extract a number as given below but nothing is printed on screen:

echo "This is an example: 65 apples" | sed -n  's/.*\([0-9]*\) apples/\1/p'

However, I get '65', if both digits are matched separately as given below:

echo "This is an example: 65 apples" | sed -n  's/.*\([0-9][0-9]\) apples/\1/p'
65

How can I match a number such that I don't know the number of digits in a number to be extracted e.g. it can be 2344 in place of 65?

Upvotes: 24

Views: 44257

Answers (6)

David
David

Reputation: 525

Now the rust tool ripgrep is a nice alternative. It is fast, runs on windows, linux and mac, and implements most of posix regex.

echo "This is an example: 65 apples" | rg '\d+' -o
65

The documentation for the -o option states:

-o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

Upvotes: 0

Khate
Khate

Reputation: 369

A simple way for extracting all numbers from a string

echo "1213 test 456 test 789" | grep -P -o "\d+"

And the result:

1213
456
789

Upvotes: 3

ctrl-alt-delor
ctrl-alt-delor

Reputation: 7735

echo "This is an example: 65 apples" | ssed -nR -e 's/.*?\b([0-9]*) apples/\1/p'

You will however need super-sed for this to work. The -R allows perl regexp.

Upvotes: 0

mathematical.coffee
mathematical.coffee

Reputation: 56915

It's because your first .* is greedy, and your [0-9]* allows 0 or more digits. Hence the .* gobbles up as much as it can (including the digits) and the [0-9]* matches nothing.

You can do:

echo "This is an example: 65 apples" | sed -n  's/.*\b\([0-9]\+\) apples/\1/p'

where I forced the [0-9] to match at least one digit, and also added a word boundary before the digits so the whole number is matched.

However, it's easier to use grep, where you match just the number:

echo "This is an example: 65 apples" | grep -P -o '[0-9]+(?= +apples)'

The -P means "perl regex" (so I don't have to worry about escaping the '+').

The -o means "only print the matches".

The (?= +apples) means match the digits followed by the word apples.

Upvotes: 6

codaddict
codaddict

Reputation: 455020

$ echo "This is an example: 65 apples" | sed -r  's/^[^0-9]*([0-9]+).*/\1/'
65

Upvotes: 29

FatalError
FatalError

Reputation: 54551

What you are seeing is the greedy behavior of regex. In your first example, .* gobbles up all the digits. Something like this does it:

echo "This is an example: 65144 apples" | sed -n  's/[^0-9]*\([0-9]\+\) apples/\1/p'
65144

This way, you can't match any digits in the first bit. Some regex dialects have a way to ask for non-greedy matching, but I don't believe sed has one.

Upvotes: 3

Related Questions