Reputation: 1017
I want to split a column in pandas dataframe and I am using this code:
df['entry'] = df['entry'].str.split('.')
Now the problem is that I want to split bigger text elements such as:
I am content. I am another content.
But in the data there is also stuff like this:
I am 10.2 content.
I don't want to split the numbers. So I would need some conditional such as:
If dot between numbers, don't split.
How can I do this with pandas?
Upvotes: 0
Views: 196
Reputation: 91518
Use negative lookarround:
Update to deal with " I am St. Content."
rx = re.compile(r'(?<!\d)(?<!\b\w\w)\.(?!\d)')
str = 'I am content. I am another content. I am 10.2 content. I am St. Content.'
str = rx.split(str)
print(str)
Output:
['I am content', ' I am another content', ' I am 10.2 content', ' I am St. Content', '']
Upvotes: 2