Reputation: 31
I am having problems understanding what my regex in bash shell is doing exactly.
I have the string abcde 12345 67890testing
. I want to extract 12345
from this string using sed
.
However, using sed -re 's/([0-9]+).*/\1/'
on the given string will give me abcde 12345
.
Alternatively, using sed -re 's/([\d]+).*/\1/'
would actually only extract abcd
.
Am I wrong in assuming that the expression [0-9]
and [\d]
ONLY capture digits? I have no idea how abcd
is being captured yet the string 67890
is not. Plus, I want to know why the space is being captured in my first query?
In addition, sed -re 's/^.*([0-9]+).*/\1/'
gives me 0
. In this instance, I completely do not understand what the regex is doing. I'd thought that the expression ^.*[0-9]+
would only capture the first instance of a string of only numbers? However, it's matching only the last 0.
All in all, I'd like to understand how I am wrong about all these. And how the problem should be solved WITHOUT using [\s] in the regex to isolate the first string of numbers.
Upvotes: 3
Views: 15395
Reputation: 527
Using cut command is simpler
echo "abcde 12345 67890testing" | cut -d' ' -f2
Upvotes: 0
Reputation: 45223
since others already provided the solution with sed, grep, here is the awk code:
echo "abcde 12345 67890testing"|awk '{for (i=1;i<=NF;i++) if ($i~/^[0-9]+$/) print $i}'
Upvotes: 0
Reputation: 784898
You can use:
sed 's/^\([0-9]*\).*$/\1/g' <<< "$s"
12345
OR else modifying your sed:
sed 's/\([0-9]\+\).*/\1/g' <<< "$s"
12345
You need to escape +
& ( and )
in sed without extended regex flag (-r OR -E
).
WIth -r
it will be:
sed -r 's/([0-9]+).*/\1/g' <<< "$s"
12345
UPDATE: You don't really need any external utility for this as you can do this in BASH itself using its regex capabilities:
[[ "$s*" =~ ^([0-9]+) ]] && echo "${BASH_REMATCH[1]}"
12345
Upvotes: 1
Reputation: 507
sed -E 's/([0-9]+).*/\1/g' <<< "$s"
The above command means: find a sequence of number followed by something and replace it with only the numbers. So it matches 12345 67890testing and replaces it with only 12345.
The final string will be abcd 12345.
If you want to get only 12345 you should use grep.
egrep -o '[0-9]+ ' <<< "$s"
Or with sed you can use:
sed -E 's/[a-zA-Z ]*([0-9]+).*/\1/g' <<< "$s"
This will drop the letters before the numbers
Upvotes: 3