octosquidopus
octosquidopus

Reputation: 871

sed - how to remove everything but a defined pattern?

I have to remove everything but 1, 2, or 3 digits (0-9, or 10-99, or 100) preceding % (I don't want to see the %, though) from another command's output and pipe it forward to another command. I know that

sed -n '/%/p'

will show only the line(s) containing %, but that's not what I want. How can I get rid of the rest of the unwanted text and leave only these numbers to then pipe them to another command?

Upvotes: 16

Views: 48316

Answers (6)

NeronLeVelu
NeronLeVelu

Reputation: 10039

sed -n "/[0-9]\{1,2\}%/ s/^[^0-9]*\([0-9]\{1,2\}\)%.*/\1/p
/100%/ s/.*/100/p
"

the 100% is to be extracted because otherwise number of kind 987% (or 123% if filtered on 1 at 1st position) are also send to output

Upvotes: 0

ghostdog74
ghostdog74

Reputation: 342303

Use awk instead of sed.

$ cat file
one two 100% three
10% four 1% five

$ awk '{
  for(i=1;i<=NF;i++) 
   if ($i ~/%$/) { print $i+0} }
  'file
100
10
1

For each field, check to see if there is % sign at the end. If yes, print the number. ($i+0 means to convert to integer). Minimal Regular expression used.

Upvotes: 0

brandizzi
brandizzi

Reputation: 27050

EDIT: I have misunderstood the OP and posted an invalid answer. I changed it to an answer that, I believe, would solve the problem in the more general scenario.

For a file such as the one below:

$ cat input
abc
123%
123
abc%
this is 456% and nothing more
456

Use sed -n -E 's/(^|.*[^0-9])([0-9]{1,3})%.*/\2/p' input

$  sed  -n -E 's/(^|.*[^0-9])([0-9]{1,3})%.*/\2/p' input
123
456

The -n flag makes sed to suppress automatic output of the lines. Then, we use the -E flag which will allow us to use extended regular expressions. (In GNU sed, the flag is not -E but instead is -r).

Now comes the s/// command. The group (^|.*[^0-9]) matchs either a beginning of line (^) or a series of zero or more chars (.*) ending in a non-digit char ([^0-9]). [0-9]\{1,3\} just matches one to three digits and is bound to a group (by the ( and ) group delimiters) if the group is preceded by (^|.*[^0-9]) and followed by %. Then .* matches everything before and after this pattern. After this, we replace everything by the second group (([0-9]{1,3})) using the backreference \2. Since we passed -n to sed, nothing would be printed but we passed the p flag to the s/// command. The result is that if the replacement is executed then the resulted line is printed. Note the p is a flag of s///, not the p command, because it comes just after the last /.

Upvotes: 3

carlpett
carlpett

Reputation: 12583

Here's my shot:

sed "/^[0-9]{1,3}%$/ bnum; d; :num s/%//"

If the line is 1-3 digits followed by a %, it removes the %-sign. Otherwise, it removes the entire line. So, for input such as

adsf
50
52%
 1
 12%
test%
1234%
%%%
85%
bye

It yields

52
85

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 246744

If you're not completely tied to sed, this is exactly what grep -o does:

grep -o '[0-9]\{1,3\}%'

Upvotes: 31

Ben Jackson
Ben Jackson

Reputation: 93700

sed -e 's/[^0-9]*\([0-9]*\)%.*/\1/' captures the digits in a group and because the pattern matches everything (the leading and trailing .*) it all gets discarded.

(my pattern matches any number of digits since sed regular expressions don't support handy shortcuts like [0-9]{1,3} that you see in perlre and others so I elected to keep it simple to illustrate the principle you cared about)

Edit: to fix quoting and replace leading .* with [^0-9]* to avoid the greedy match consuming the numbers. Once again more straightforward with perlre where you can use a non-greedy .?*

Upvotes: 0

Related Questions