matt
matt

Reputation: 83

How to match a regex 1 to 3 times in a sed command?

Problem

I want to get any text that consists of 1 to three digits followed by a % but without the % using sed.

What I tried

So i guess the following regex should match the right pattern : [0-9]{1,3}%.
Then i can use this sed command to catch the three digits and only print them :
sed -nE 's/.*([0-9]{1,3})%.*/\1/p'

Example

However when i run it, it shows :

$ echo "100%" | sed -nE 's/.*([0-9]{1,3})%.*/\1/p'
0

instead of

100

Obviously, there's something wrong with my sed command and i think the problem comes from here :

[0-9]{1,3}

which apparently doesn't do what i want it to do.

edit:

Solution

The .* at the start of sed -nE 's/.*([0-9]{1,3})%.*/\1/p' "ate" the two first digits.

The right way to write it, according to Wicktor's answer, is :

sed -nE 's/(.*[^0-9])?([0-9]{1,3})%.*/\2/p'

Upvotes: 4

Views: 3075

Answers (4)

udippel
udippel

Reputation: 145

Came here during my search for a similar item: A large number of files had to have their last numbers (including underscore) removed, one and two digits. Like example_10.mp3 into example.mp3. I spare everyone all my efforts, read some twenty pages, and also tried all + and * and ? combination, in front and behind, with and without parenthesis. Of course, also the one mentioned above: [0-9]{1,2}, which can be found in many places, and doesn't work. In the end, the solution was /_+([0-9]).mp3/.mp3.

I also tried for your description:

$ t=aaa234%bbbbb
$ echo "$t" "${t/+([0-9])%/}"
aaa234%bbbbb aaabbbbb

$ t=aaa2%bbbbb
$ echo "$t" "${t/+([0-9])%/}"
aaa2%bbbbb aaabbbbb

I think, this is the one you were looking for?

Upvotes: 1

potong
potong

Reputation: 58483

This might work for you (GNU sed):

sed -En 's/.*\<([0-9]{1,3})%.*/\1/p' file

This is a filtering exercise, so use the -n option.

Use a back reference to capture 1 to 3 digits, followed by % and print the result if successful.

N.B. The \< ensures the digits start on a word boundary, \b could also be used. The -E option is employed to reduce the number of back slashes which would normally be necessary to quote (,),{ and } metacharacters.

Upvotes: 1

anubhava
anubhava

Reputation: 785581

It will be easier to use a cut + grep option:

echo "abc 100%" | cut -d% -f1 | grep -oE '[0-9]{1,3}'
100

echo "100%" | cut -d% -f1 | grep -oE '[0-9]{1,3}'
100

Or else you may use this awk:

echo "100%" | awk 'match($0, /[0-9]{1,3}%/){print substr($0, RSTART, RLENGTH-1)}'
100

Or else if you have gnu grep then use -P (PCRE) option:

echo "abc 100%" | ggrep -oP '[0-9]{1,3}(?=%)'
100

Upvotes: 3

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627083

The .* grabs all digits leaving just the last of the three digits in 100%.

Use

sed -nE 's/(.*[^0-9])?([0-9]{1,3})%.*/\2/p'

Details

  • (.*[^0-9])? - (Group 1) an optional sequence of any 0 or more chars up to the non-digit char including it
  • ([0-9]{1,3}) - (Group 2) one to three digits
  • % - a % char
  • .* - the rest of the string.

The match is replaced with Group 2 contents, and that is the only value printed since n suppresses the default line output.

Upvotes: 4

Related Questions