Reputation: 1309
Hi how could I split on a dot after a number and generate just two columns to the original df?
df.name
``
0 1. The start
1 2. Today's world
2 3. Today's world vs. yesterday.
...
20 20. The change
Expected Output:
number title
0 1 The start
1 2 Today's world
2 3 Today's world vs. yesterday.
...
20 20 The change
I tried
df[['number', 'title']] = df.name.str.split('(\d+.)', expand=True)
Upvotes: 1
Views: 71
Reputation: 862601
Use Series.str.extract
for split by integers before .
and another values:
df[['number', 'title']] = df.name.str.extract('(^\d+)\.\s+(.*)')
print (df)
name number title
0 1. The start 1 The start
1 2. Today's world 2 Today's world
2 3. Today's world vs. yesterday. 3 Today's world vs. yesterday.
20 20. The change 20 The change
Upvotes: 2
Reputation: 36390
You need to use zero length assertion and escape .
as it has special meaning in regular expressions pattern i.e.:
import pandas as pd
df = pd.DataFrame({"name":["1. The start","2. Today's world","3. Today's world vs. yesterday."]})
df[['number', 'title']] = df.name.str.split(r'(?<=\d)\.', expand=True)
print(df)
Output:
name number title
0 1. The start 1 The start
1 2. Today's world 2 Today's world
2 3. Today's world vs. yesterday. 3 Today's world vs. yesterday.
Upvotes: 1