Steven
Steven

Reputation: 599

Extract numbers from a string using sed and regular expressions

Another question for the sed experts.

I have a string representing an pathname that will have two numbers in it. An example is:

./pentaray_run2/Trace_220560.dat

I need to extract the second of these numbers - ie 220560

I have (with some help from the forums) been able to extract all the numbers together (ie 2220560) with:

sed "s/[^0-9]//g"

or extract only the first number with:

sed -r 's|^([^.]+).*$|\1|; s|^[^0-9]*([0-9]+).*$|\1|'

But what I'm after is the second number!! Any help much appreciated.

PS the number I'm after is always the second number in the string.

Upvotes: 33

Views: 109916

Answers (4)

potong
potong

Reputation: 58568

This might work for you (GNU sed):

sed -r 's/([^0-9]*([0-9]*)){2}.*/\2/' file

This extracts the second number:

sed -r 's/([^0-9]*([0-9]*)){1}.*/\2/' file

and this extracts the first.

Upvotes: 6

You can extract the last numbers with this:

sed -e 's/.*[^0-9]\([0-9]\+\)[^0-9]*$/\1/'

It is easier to think this backwards:

  1. From the end of the string, match zero or more non-digit characters
  2. Match (and capture) one or more digit characters
  3. Match at least one non-digit character
  4. Match all the characters to the start of the string

Part 3 of the match is where the "magic" happens, but it also limits your matches to have at least a non-digit before the number (ie. you can't match a string with only one number that is at the start of the string, although there is a simple workaround of inserting a non-digit to the start of the string).

The magic is to counter-act the left-to-right greediness of the .* (part 4). Without part 3, part 4 would consume all it can, which includes the numbers, but with it, matching makes sure that it stops in order to allow at least a non-digit followed by a digit to be consumed by parts 1 and 2, allowing the number to be captured.

Upvotes: 12

Gilles Quénot
Gilles Quénot

Reputation: 185790

If grep is welcome :

$ echo './pentaray_run2/Trace_220560.dat' | grep -oP '\d+\D+\K\d+'
220560

And more portable with Perl with the same regex :

echo './pentaray_run2/Trace_220560.dat' | perl -lne 'print $& if /\d+\D+\K\d+/'
220560

I think the approach is cleaner & more robust than using sed

Upvotes: 9

Kent
Kent

Reputation: 195239

is this ok?

sed -r 's/.*_([0-9]*)\..*/\1/g'

with your example:

kent$   echo "./pentaray_run2/Trace_220560.dat"|sed -r 's/.*_([0-9]*)\..*/\1/g'
220560

Upvotes: 34

Related Questions