Reputation: 8586
I am trying to use grep to just capture a number in a string but I am having difficulty.
echo "There are <strong>54</strong> cities | grep -o "([0-9]+)"
How am I suppose to just have it return "54"? I have tried the above grep command and it doesn't work.
echo "You have <strong>54</strong>" | grep -o '[0-9]'
seems to sort of work but it prints
5
4
instead of 54
Upvotes: 0
Views: 154
Reputation: 3913
You need to use the "E" option for extended regex support (or use egrep). On my Mac OSX:
$ echo "There are <strong>54</strong> cities" | grep -Eo "[0-9]+"
54
You also need to think if there are going to be more than one occurrence of numbers in the line. What should be the behavior then?
EDIT 1: since you have now specified the requirement to be a number between <strong>
tags, I would recommend using sed. On my platform, grep does not have the "P" option for perl style regexes. On my other box, the version of grep
specifies that this is an experimental feature so I would go with sed
in this case.
$ echo "There are <strong>54</strong> 12 cities" | sed -rn 's/^.*<strong>\s*([0-9]+)\s*<\/strong>.*$/\1/p'
54
Here "r"
is for extended regex.
EDIT 2: If you have the "PCRE" option in your version of grep, you could also utilize the following with positive lookbehinds and lookaheads.
$ echo "There are <strong>54 </strong> 12 cities" | grep -o -P "(?<=<strong>)\s*([0-9]+)\s*(?=<\/strong>)"
54
Upvotes: 0
Reputation: 184965
$ echo "There are <strong>54</strong> cities " |
xmllint --html --xpath '//strong/text()' -
54
Check RegEx match open tags except XHTML self-contained tags
Upvotes: 1