asd
asd

Reputation: 1309

Str split only on specific place

Hi how could I split on a dot after a number and generate just two columns to the original df?

df.name
 ``         
0    1. The start
1    2. Today's world
2    3. Today's world vs. yesterday.
...
20   20. The change

Expected Output:

     number   title
0     1       The start
1     2       Today's world
2     3       Today's world vs. yesterday.
...
20    20      The change

I tried

df[['number', 'title']] = df.name.str.split('(\d+.)', expand=True)

Upvotes: 1

Views: 71

Answers (2)

jezrael
jezrael

Reputation: 862601

Use Series.str.extract for split by integers before . and another values:

df[['number', 'title']] = df.name.str.extract('(^\d+)\.\s+(.*)')

print (df)
                               name number                          title
0                      1. The start      1                      The start
1                  2. Today's world      2                  Today's world
2   3. Today's world vs. yesterday.      3   Today's world vs. yesterday.
20                   20. The change     20                     The change

Upvotes: 2

Daweo
Daweo

Reputation: 36390

You need to use zero length assertion and escape . as it has special meaning in regular expressions pattern i.e.:

import pandas as pd
df = pd.DataFrame({"name":["1. The start","2. Today's world","3. Today's world vs. yesterday."]})
df[['number', 'title']] = df.name.str.split(r'(?<=\d)\.', expand=True)
print(df)

Output:

                              name number                          title
0                     1. The start      1                      The start
1                 2. Today's world      2                  Today's world
2  3. Today's world vs. yesterday.      3   Today's world vs. yesterday.

Upvotes: 1

Related Questions