Reputation: 9807
I have tried to extract a number as given below but nothing is printed on screen:
echo "This is an example: 65 apples" | sed -n 's/.*\([0-9]*\) apples/\1/p'
However, I get '65', if both digits are matched separately as given below:
echo "This is an example: 65 apples" | sed -n 's/.*\([0-9][0-9]\) apples/\1/p'
65
How can I match a number such that I don't know the number of digits in a number to be extracted e.g. it can be 2344 in place of 65?
Upvotes: 24
Views: 44257
Reputation: 525
Now the rust tool ripgrep is a nice alternative. It is fast, runs on windows, linux and mac, and implements most of posix regex.
echo "This is an example: 65 apples" | rg '\d+' -o
65
The documentation for the -o
option states:
-o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
Upvotes: 0
Reputation: 369
A simple way for extracting all numbers from a string
echo "1213 test 456 test 789" | grep -P -o "\d+"
And the result:
1213
456
789
Upvotes: 3
Reputation: 7735
echo "This is an example: 65 apples" | ssed -nR -e 's/.*?\b([0-9]*) apples/\1/p'
You will however need super-sed for this to work. The -R allows perl regexp.
Upvotes: 0
Reputation: 56915
It's because your first .*
is greedy, and your [0-9]*
allows 0 or more digits.
Hence the .*
gobbles up as much as it can (including the digits) and the [0-9]*
matches nothing.
You can do:
echo "This is an example: 65 apples" | sed -n 's/.*\b\([0-9]\+\) apples/\1/p'
where I forced the [0-9]
to match at least one digit, and also added a word boundary before the digits so the whole number is matched.
However, it's easier to use grep
, where you match just the number:
echo "This is an example: 65 apples" | grep -P -o '[0-9]+(?= +apples)'
The -P
means "perl regex" (so I don't have to worry about escaping the '+').
The -o
means "only print the matches".
The (?= +apples)
means match the digits followed by the word apples.
Upvotes: 6
Reputation: 455020
$ echo "This is an example: 65 apples" | sed -r 's/^[^0-9]*([0-9]+).*/\1/'
65
Upvotes: 29
Reputation: 54551
What you are seeing is the greedy behavior of regex. In your first example, .*
gobbles up all the digits. Something like this does it:
echo "This is an example: 65144 apples" | sed -n 's/[^0-9]*\([0-9]\+\) apples/\1/p'
65144
This way, you can't match any digits in the first bit. Some regex dialects have a way to ask for non-greedy matching, but I don't believe sed
has one.
Upvotes: 3