cosmin
cosmin

Reputation: 43

Python REGEX How to extract particular numbers from variable

I have the following problem:

var a = ' 15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123

I'd like a regex to extract only the numbers: 15159970, 15615115, 11224455, 55441123

What a have so far:

re.findall(r'(\d+\s)\(', a)

which only extracts the first 2 numbers: 15159970, 15615115

Having also a second var b = 15159970, 15615115, 11224455, 55441126 I would like to compare the 2 vars and if they differ then a print("vars are different!")

Thanks!

Upvotes: 4

Views: 142

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626893

You may extract all chunks of digits not preceded with a digit or digit + dot and not followed with a dot + digit or a digit:

(?<!\d)(?<!\d\.)\d+(?!\.?\d)

See the regex demo

Details

  • (?<!\d) - a negative lookbehind that fails a location immediately preceded with a digit
  • (?<!\d\.) - a negative lookbehind that fails a location immediately preceded with a digit and a dot
  • \d+ - 1+ digits
  • (?!\.?\d) - a negative lookahead that fails a location immediately followed with a digit or a dot + a digit.

Python demo:

import re
a = ' 15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123 '
print( re.findall(r'(?<!\d)(?<!\d\.)\d+(?!\.?\d)', a) )
# => ['15159970', '15615115', '11224455', '55441123']

Another solution: only extract the digit chunks outside of parentheses.

See this Python demo:

import re
text = "15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123 (28.11.2014 12:43:14)"
print( list(filter(None, re.findall(r'\([^()]+\)|(\d+)', text))) )
# => ['15159970', '15615115', '11224455', '55441123']

Here, \([^()]+\)|(\d+) matches

  • \([^()]+\) - (, any 1+ chars other than ( and ) and then )
  • | - or
  • (\d+) - matches and captures into Group 1 one or more digits (re.findall only includes captured substrings if there is a capturing group in the pattern).

Empty items appear in the result when the non-parenthesized match occurs, thus, we need to remove them (either with list(filter(None, results)) or with [x for x in results if x]).

Upvotes: 2

Related Questions