Reputation: 966
I have a set of strings in which numbers can be separated by different characters, or letters:
12
14:45:09
2;32
04:43
434.34
43M 343ho
I want to get a list of these numbers for each row:
[12]
[14, 45, 9]
[2, 32]
[4, 43]
[434, 34]
[43, 343]
I try to do so, but this does not work:
>>> import re
>>> pattern = r'(\d*)'
>>> re.split(pattern, '12')
['', '12', '', '', '']
>>> re.split(pattern, '14:45:09')
['', '14', '', '', ':', '45', '', '', ':', '09', '', '', '']
>>> pattern = r'([0-9]*)'
>>> re.split(pattern, '14:45:09')
['', '14', '', '', ':', '45', '', '', ':', '09', '', '', '']
>>> re.split(pattern, '43M 343ho')
['', '43', '', '', 'M', '', ' ', '343', '', '', 'h', '', 'o', '', '']
>>>
How can this be done correctly?
Upvotes: 0
Views: 162
Reputation: 38219
from sys import stdin
import re
for line in stdin:
result = [int(x) for x in re.split(r'\D+',line) if x]
print(result)
or
result = [int(x) for x in re.findall(r'\d+',line)]
Upvotes: 1
Reputation: 163642
Instead of split you might use re.findall matching 0+ times a zero and capture 1+ digits
0*(\d+)
For example
import re
regex = r"0*(\d+)"
strings = [
"12",
"14:45:09",
"2;32",
"04:43",
"434.34",
"43M 343ho"
]
for s in strings:
print(re.findall(regex, s))
Output
['12']
['14', '45', '9']
['2', '32']
['4', '43']
['434', '34']
['43', '343']
Upvotes: 3
Reputation: 907
With string split:
"14:45:09".split(':') The argument to split is the character on which to split.
With re: re.split(r':', "14:45:09")
Upvotes: 0