Reputation: 43
I have the following problem:
var a = ' 15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123
I'd like a regex to extract only the numbers: 15159970, 15615115, 11224455, 55441123
What a have so far:
re.findall(r'(\d+\s)\(', a)
which only extracts the first 2 numbers: 15159970, 15615115
Having also a second var b = 15159970, 15615115, 11224455, 55441126 I would like to compare the 2 vars and if they differ then a print("vars are different!")
Thanks!
Upvotes: 4
Views: 142
Reputation: 626893
You may extract all chunks of digits not preceded with a digit or digit + dot and not followed with a dot + digit or a digit:
(?<!\d)(?<!\d\.)\d+(?!\.?\d)
See the regex demo
Details
(?<!\d)
- a negative lookbehind that fails a location immediately preceded with a digit(?<!\d\.)
- a negative lookbehind that fails a location immediately preceded with a digit and a dot\d+
- 1+ digits(?!\.?\d)
- a negative lookahead that fails a location immediately followed with a digit or a dot + a digit.import re
a = ' 15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123 '
print( re.findall(r'(?<!\d)(?<!\d\.)\d+(?!\.?\d)', a) )
# => ['15159970', '15615115', '11224455', '55441123']
Another solution: only extract the digit chunks outside of parentheses.
See this Python demo:
import re
text = "15159970 (30.12.2015), 15615115 (01.01.1970), 11224455, 55441123 (28.11.2014 12:43:14)"
print( list(filter(None, re.findall(r'\([^()]+\)|(\d+)', text))) )
# => ['15159970', '15615115', '11224455', '55441123']
Here, \([^()]+\)|(\d+)
matches
\([^()]+\)
- (
, any 1+ chars other than (
and )
and then )
|
- or(\d+)
- matches and captures into Group 1 one or more digits (re.findall
only includes captured substrings if there is a capturing group in the pattern).Empty items appear in the result when the non-parenthesized match occurs, thus, we need to remove them (either with list(filter(None, results))
or with [x for x in results if x]
).
Upvotes: 2