Felix
Felix

Reputation: 1017

Pandas conditional split

I want to split a column in pandas dataframe and I am using this code:

df['entry'] = df['entry'].str.split('.')

Now the problem is that I want to split bigger text elements such as:

I am content. I am another content.

But in the data there is also stuff like this:

I am 10.2 content.

I don't want to split the numbers. So I would need some conditional such as:

If dot between numbers, don't split.

How can I do this with pandas?

Upvotes: 0

Views: 196

Answers (1)

Toto
Toto

Reputation: 91518

Use negative lookarround:

Update to deal with " I am St. Content."

rx = re.compile(r'(?<!\d)(?<!\b\w\w)\.(?!\d)')
str = 'I am content. I am another content. I am 10.2 content. I am St. Content.'
str = rx.split(str)
print(str)

Output:

['I am content', ' I am another content', ' I am 10.2 content', ' I am St. Content', '']

Upvotes: 2

Related Questions