Gin
Gin

Reputation: 31

Regex matching only numbers

I am having problems understanding what my regex in bash shell is doing exactly.

I have the string abcde 12345 67890testing. I want to extract 12345 from this string using sed.

However, using sed -re 's/([0-9]+).*/\1/' on the given string will give me abcde 12345.

Alternatively, using sed -re 's/([\d]+).*/\1/' would actually only extract abcd.

Am I wrong in assuming that the expression [0-9] and [\d] ONLY capture digits? I have no idea how abcd is being captured yet the string 67890 is not. Plus, I want to know why the space is being captured in my first query?

In addition, sed -re 's/^.*([0-9]+).*/\1/' gives me 0. In this instance, I completely do not understand what the regex is doing. I'd thought that the expression ^.*[0-9]+ would only capture the first instance of a string of only numbers? However, it's matching only the last 0.

All in all, I'd like to understand how I am wrong about all these. And how the problem should be solved WITHOUT using [\s] in the regex to isolate the first string of numbers.

Upvotes: 3

Views: 15395

Answers (4)

Want2bExpert
Want2bExpert

Reputation: 527

Using cut command is simpler

echo "abcde 12345 67890testing" | cut -d' ' -f2

Upvotes: 0

BMW
BMW

Reputation: 45223

since others already provided the solution with sed, grep, here is the awk code:

echo "abcde 12345 67890testing"|awk '{for (i=1;i<=NF;i++) if ($i~/^[0-9]+$/) print $i}'

Upvotes: 0

anubhava
anubhava

Reputation: 784898

You can use:

sed 's/^\([0-9]*\).*$/\1/g' <<< "$s"
12345

OR else modifying your sed:

sed 's/\([0-9]\+\).*/\1/g' <<< "$s"
12345

You need to escape + & ( and ) in sed without extended regex flag (-r OR -E).

WIth -r it will be:

sed -r 's/([0-9]+).*/\1/g' <<< "$s"
12345

UPDATE: You don't really need any external utility for this as you can do this in BASH itself using its regex capabilities:

[[ "$s*" =~ ^([0-9]+) ]] && echo "${BASH_REMATCH[1]}"
12345

Upvotes: 1

drolando
drolando

Reputation: 507

sed -E 's/([0-9]+).*/\1/g'  <<< "$s" 

The above command means: find a sequence of number followed by something and replace it with only the numbers. So it matches 12345 67890testing and replaces it with only 12345.

The final string will be abcd 12345.

If you want to get only 12345 you should use grep.

egrep -o '[0-9]+ ' <<< "$s"

Or with sed you can use:

sed -E 's/[a-zA-Z ]*([0-9]+).*/\1/g'  <<< "$s"

This will drop the letters before the numbers

Upvotes: 3

Related Questions