Reputation: 359
I have this string
a = "IN 744301 Mus Andaman & Nicobar Islands 01 Nicobar 638 Carnicobar 9.2333 92.7833 4"
I want to to split this with regular expression where ever number has come, the output would be like this
['IN' , '744301', 'Mus Andaman & Nicobar Islands', '01' , 'Nicobar', '638', 'Carnicobar', '9.2333','92.7833', '4' ]
Upvotes: 3
Views: 123
Reputation: 71451
You can use a lookahead and lookbehind:
import re
a = "IN 744301 Mus Andaman & Nicobar Islands 01 Nicobar 638 Carnicobar 9.2333 92.7833 4"
new_a = re.split('(?<=\d)\s+|\s+(?=\d)', a)
Output:
['IN', '744301', 'Mus Andaman & Nicobar Islands', '01', 'Nicobar', '638', 'Carnicobar', '9.2333', '92.7833', '4']
Regex explanation:
(?<=\d)\s+
: matches any whitespace (\s
) that is preceded by a digit (\d
).
\s+(?=\d)
: matches any whitespace followed by a digit.
|
: applies either joined expression that has a match.
Upvotes: 4
Reputation: 22952
You can use re.split with a group (capturing parenthesis) to keep the delimiters (the numbers) in the result:
>>> import re
>>> a = "IN 744301 Mus Andaman & Nicobar Islands 01 Nicobar 638 Carnicobar 9.2333 92.7833 4"
>>> re.split(r'(\d+(?:\.\d+)?)', a)
['IN ', '744301', ' Mus Andaman & Nicobar Islands ', '01', ' Nicobar ', '638', ' Carnicobar ', '9.2333', ' ', '92.7833', ' ', '4', '']
Upvotes: 1
Reputation: 57033
You can split
by a number-like pattern and then findall
by the same pattern. Since split
and findall
are "sister" functions, you will get both non-numeric and numeric pieces. Now, zip them into a single list and eliminate spaces.
from itertools import chain
# You can improve the regex to cover numbers that start with a .
NUMBER = r'\d+(?:\.\d*)?'
combined = chain.from_iterable(zip(re.split(NUMBER, a),
re.findall(NUMBER, a)))
result = [x for x in map(str.strip, combined) if x]
#['IN', '744301', 'Mus Andaman & Nicobar Islands', '01', 'Nicobar',
# '638', 'Carnicobar', '9.2333', '92.7833', '4']
Upvotes: 1