JungleDiff
JungleDiff

Reputation: 3505

Pandas: Split a string and then create a new column?

enter image description here

Let's say you have Col1.

How do you create the new column 'Col2' after you split the string values in Col1 until you see _?

Upvotes: 20

Views: 38705

Answers (4)

mozway
mozway

Reputation: 262484

Chainingsplit and slicing with str is not highly efficient. Better use str.removeprefix:

df['Col2'] = df['Col1'].str.removeprefix('Name_')

or str.extract the terminal part without underscore:

df['Col2'] = df['Col1'].str.extract('([^_]*$)', expand=False)

Output:

          Col1    Col2
0    Name_John    John
1     Name_Jay     Jay
2  Name_Sherry  Sherry
3       Cherry  Cherry

Timings (100K rows):

# str.removeprefix
64.7 ms ± 5.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# str.extract
147 ms ± 33.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# str.split + str[-1]
164 ms ± 26.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 0

hui chen
hui chen

Reputation: 1074

You can simply use str.split() method with expand=True argument.

For example:

ncaa[['Win', 'Lose']] = ncaa['Record'].str.split('-', expand=True)

Upvotes: 14

Scott Boston
Scott Boston

Reputation: 153550

Edit to handle strings without '_':

df['Col2'] = (np.where(df['Col1'].str.contains('_'),
                  df['Col1'].str.split('_').str[1],
                  df['Col1']))

OR as COLDSPEED suggests in comments:

df['Col1'].str.split('_').str[-1]

You can use the .str access with indexing:

df['Col2'] = df['Col1'].str.split('_').str[1]

Example:

df = pd.DataFrame({'Col1':['Name_John','Name_Jay','Name_Sherry']})
df['Col2'] = df['Col1'].str.split('_').str[1]

Output:

          Col1    Col2
0    Name_John    John
1     Name_Jay     Jay
2  Name_Sherry  Sherry

Upvotes: 34

BENY
BENY

Reputation: 323386

I think this will work . If...else logic here is for your additional requested, when do not have '_' keep the original

   df['Col2']= df['Col1'].apply(lambda x: x.split('_')[1] if x.find('_')!=-1 else x )

Upvotes: 7

Related Questions