Python logix
Python logix

Reputation: 359

Split a string with where number comes

I have this string

a = "IN 744301 Mus Andaman & Nicobar Islands   01  Nicobar 638 Carnicobar 9.2333  92.7833 4"

I want to to split this with regular expression where ever number has come, the output would be like this

['IN' , '744301', 'Mus Andaman & Nicobar Islands', '01' , 'Nicobar', '638', 'Carnicobar', '9.2333','92.7833', '4' ]

Upvotes: 3

Views: 123

Answers (3)

Ajax1234
Ajax1234

Reputation: 71451

You can use a lookahead and lookbehind:

import re
a = "IN 744301 Mus Andaman & Nicobar Islands   01  Nicobar 638 Carnicobar 9.2333  92.7833 4"
new_a = re.split('(?<=\d)\s+|\s+(?=\d)', a)

Output:

['IN', '744301', 'Mus Andaman & Nicobar Islands', '01', 'Nicobar', '638', 'Carnicobar', '9.2333', '92.7833', '4']

Regex explanation:

(?<=\d)\s+: matches any whitespace (\s) that is preceded by a digit (\d).

\s+(?=\d): matches any whitespace followed by a digit.

|: applies either joined expression that has a match.

Upvotes: 4

Laurent LAPORTE
Laurent LAPORTE

Reputation: 22952

You can use re.split with a group (capturing parenthesis) to keep the delimiters (the numbers) in the result:

>>> import re
>>> a = "IN 744301 Mus Andaman & Nicobar Islands   01  Nicobar 638 Carnicobar 9.2333  92.7833 4"
>>> re.split(r'(\d+(?:\.\d+)?)', a)
['IN ', '744301', ' Mus Andaman & Nicobar Islands   ', '01', '  Nicobar ', '638', ' Carnicobar ', '9.2333', '  ', '92.7833', ' ', '4', '']

Upvotes: 1

DYZ
DYZ

Reputation: 57033

You can split by a number-like pattern and then findall by the same pattern. Since split and findall are "sister" functions, you will get both non-numeric and numeric pieces. Now, zip them into a single list and eliminate spaces.

from itertools import chain
# You can improve the regex to cover numbers that start with a .
NUMBER = r'\d+(?:\.\d*)?'  
combined = chain.from_iterable(zip(re.split(NUMBER, a),                                                        
                                   re.findall(NUMBER, a)))
result = [x for x in map(str.strip, combined) if x]
#['IN', '744301', 'Mus Andaman & Nicobar Islands', '01', 'Nicobar',
# '638', 'Carnicobar', '9.2333', '92.7833', '4']

Upvotes: 1

Related Questions