krzna
krzna

Reputation: 195

Python regex to split based on commas that follow numbers

I have a large file from which I need to load into a list of strings. each element will contain text until a ',' that immediately follows numbers

for eg:

this is some text, value 45789, followed by, 1245, and more text 78965, more random text 5252,

this should become:

["this is some text, value 45789", "followed by, 1245", "and more text 78965", "more random text 5252"]

I currently doing re.sub(r'([0-9]+),','~', <input-string>) and then splitting on '~' (since my file doesnt contain ~) but this throws out the numbers before the commas.. any thoughts?

Upvotes: 2

Views: 143

Answers (2)

falsetru
falsetru

Reputation: 369074

You can use re.split with positive look-behind assertion:

>>> import re
>>> 
>>> text = 'this is some text, value 45789, followed by, 1245, and more text 78965, more random text 5252,'
>>> re.split(r'(?<=\d),', text)
['this is some text, value 45789',
 ' followed by, 1245',
 ' and more text 78965',
 ' more random text 5252',
 '']

Upvotes: 2

Mmm Donuts
Mmm Donuts

Reputation: 10285

If you want it to deal with spaces as well, do this:

string = "  blah, lots  ,  of ,  spaces, here "
pattern = re.compile("^\s+|\s*,\s*|\s+$")
result = [x for x in pattern.split(string) if x]
print(result)
>>> ['blah', 'lots', 'of', 'spaces', 'here']

Upvotes: 0

Related Questions