Bijan
Bijan

Reputation: 8586

Grep: Capture just number

I am trying to use grep to just capture a number in a string but I am having difficulty.

echo "There are <strong>54</strong> cities | grep -o "([0-9]+)"

How am I suppose to just have it return "54"? I have tried the above grep command and it doesn't work.

echo "You have <strong>54</strong>" | grep -o '[0-9]' seems to sort of work but it prints

5
4

instead of 54

Upvotes: 0

Views: 154

Answers (2)

Khanna111
Khanna111

Reputation: 3913

You need to use the "E" option for extended regex support (or use egrep). On my Mac OSX:

$ echo "There are <strong>54</strong> cities" | grep -Eo "[0-9]+"
54

You also need to think if there are going to be more than one occurrence of numbers in the line. What should be the behavior then?

EDIT 1: since you have now specified the requirement to be a number between <strong> tags, I would recommend using sed. On my platform, grep does not have the "P" option for perl style regexes. On my other box, the version of grep specifies that this is an experimental feature so I would go with sed in this case.

$  echo "There are <strong>54</strong> 12 cities" | sed  -rn 's/^.*<strong>\s*([0-9]+)\s*<\/strong>.*$/\1/p'
54

Here "r" is for extended regex.

EDIT 2: If you have the "PCRE" option in your version of grep, you could also utilize the following with positive lookbehinds and lookaheads.

$  echo "There are <strong>54 </strong> 12 cities" | grep -o -P "(?<=<strong>)\s*([0-9]+)\s*(?=<\/strong>)"
54

RegEx Demo

Upvotes: 0

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 184965

Don't parse HTML with regex, use a proper parser :

$ echo "There are <strong>54</strong> cities " |
    xmllint --html --xpath '//strong/text()' -

OUTPUT:

54

Check RegEx match open tags except XHTML self-contained tags

Upvotes: 1

Related Questions