re.search( ): (\d+) matches only a single digit

Question

I want to parse the value 387 KB/s from the string:

str1 = '2015-07-02 02:05:02 (387 KB/s)'

The regular expression I have written for it is this:

mbps = re.search('\d+-\d+-\d+ \d+:\d+:\d+ .*(\d+) (.*/s)',str1)
var = mbps.group(1)

Printing var gives me only 7 instead of 387 i.e. it matches only a single digit.

Please suggest how can I get the complete number i.e. 387?

Thanks.

Tim Pietzcker · Accepted Answer

The problem is that .* is greedy (matching as much as it can) and it can also match digits, so it matches (38, leaving only 7 for the \d+ (which, since it has successfully matched, sees no reason to expand its match).

One possible solution would be to make the quantifier lazy:

mbps = re.search(r'\d+-\d+-\d+ \d+:\d+:\d+ .*?(\d+) (.*/s)',str1)

A better solution would be more specific, for example disallowing digits:

mbps = re.search(r'\d+-\d+-\d+ \d+:\d+:\d+ [^\d]*(\d+) (.*/s)',str1)

Also, always use raw strings with regexes.

re.search( ): (\d+) matches only a single digit

Answers (1)

Related Questions