Reputation: 49
I have the following string from which I need to extract the value 14.123456 which is directly after the keyword airline_freq: (which is a unique keyword in my string)
Please help find the correct regex (indexing m.group() doesn't work beyond 0)
import re
s = "DATA:init: 221.000OTHER:airline_freq: 14.123456FEATURE:airline_amp: 0.333887 more text"
m = re.search(r'[airline_freq:\s]?\d*\.\d+|\d+', s)
m.group()
$ result 221.000
Upvotes: 0
Views: 545
Reputation: 3189
This will match only the float as a single group.
r'airline_freq:\s+([-0-9.]+)'
"DATA:init: 221.000OTHER:airline_freq: 14.123456FEATURE:airline_amp: 0.333887 more text"
Upvotes: 1
Reputation: 28406
You can probably use this:
(?<=airline_freq:)\s*(?:-?(?:\d+(?:\.\d*)?|\.\d+))
This uses a lookbehind to enforce that the number is preceded by airline_freq:
but it does not make it part of the match.
The number-matching part of the regex can match numbers with or without .
and, if there is .
, it can also be just leading or trailing (in this case clearly not before the -
sign). You can also allow an optional +
instead of the -
, by using [+-]
instead of -
.
Unfortunately it seems Python does not allow variable length lookbehind, so I cannot put the \s*
in it; the consequence is that the spaces between the :
and the number are part of the match. This in general could be no problem, as leading spaces when giving a number to a program are generally skipped automatically.
However, you can still remove the first ?:
in the regex above to make the number-matching group capturing, so that the number is available as \1
.
The example is here.
Upvotes: 1
Reputation: 167
I have this:
(?<=airline_freq\:\s\s)(\d+\.\d+)
In [2]: import re
...: s = "DATA:init: 221.000OTHER:airline_freq: 14.123456FEATURE:airline_amp: 0.333887 more text"
...: m = re.search(r'(?<=airline_freq\:\s\s)(\d+\.\d+)', s)
...: m.group()
Out[2]: '14.123456'
Test: https://regexr.com/51q41
If you're not sure about the number of spaces between airline_freq: and the desired float number, you can use:
(?<=airline_freq\:)\s*(\d+\.\d+)
and m.group().lstrip()
to get rid of the left spaces.
Upvotes: 0