Reputation: 3505
Let's say you have Col1.
How do you create the new column 'Col2' after you split the string values in Col1 until you see _?
Upvotes: 20
Views: 38705
Reputation: 262484
Chainingsplit
and slicing with str
is not highly efficient. Better use str.removeprefix
:
df['Col2'] = df['Col1'].str.removeprefix('Name_')
or str.extract
the terminal part without underscore:
df['Col2'] = df['Col1'].str.extract('([^_]*$)', expand=False)
Output:
Col1 Col2
0 Name_John John
1 Name_Jay Jay
2 Name_Sherry Sherry
3 Cherry Cherry
Timings (100K rows):
# str.removeprefix
64.7 ms ± 5.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# str.extract
147 ms ± 33.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# str.split + str[-1]
164 ms ± 26.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 0
Reputation: 1074
You can simply use str.split()
method with expand=True
argument.
For example:
ncaa[['Win', 'Lose']] = ncaa['Record'].str.split('-', expand=True)
Upvotes: 14
Reputation: 153550
Edit to handle strings without '_':
df['Col2'] = (np.where(df['Col1'].str.contains('_'),
df['Col1'].str.split('_').str[1],
df['Col1']))
OR as COLDSPEED suggests in comments:
df['Col1'].str.split('_').str[-1]
You can use the .str access with indexing:
df['Col2'] = df['Col1'].str.split('_').str[1]
Example:
df = pd.DataFrame({'Col1':['Name_John','Name_Jay','Name_Sherry']})
df['Col2'] = df['Col1'].str.split('_').str[1]
Output:
Col1 Col2
0 Name_John John
1 Name_Jay Jay
2 Name_Sherry Sherry
Upvotes: 34
Reputation: 323386
I think this will work . If...else
logic here is for your additional requested, when do not have '_'
keep the original
df['Col2']= df['Col1'].apply(lambda x: x.split('_')[1] if x.find('_')!=-1 else x )
Upvotes: 7