Reputation: 630
I would like to split the strings of a specific column of a dataframe by " - " and save the last part into a new column. This works outside a df:
s0 = '34 years old woman with pain in her XXX - Pharyngitis'
s1 = '67 years old man with xxx - yyy zzz - Nephropathy'
s2 = 'Metastatic Liver Cancer'
print(s0.split(" - ")[-1]) # works
print(s1.split(" - ")[-1])
print(s2.split(" - ")[-1])
But not with a data frame:
df = pd.DataFrame([s0, s1, s2], columns=['title'])
df['diagnosis'] = df['title'].str.split(' - ')[-1] # KeyError: -1
print(df['diagnosis'])
What am I doing wrong?
Upvotes: 2
Views: 401
Reputation: 92854
Instead of splitting a string into a list of chunks - pd.Series.str.rfind
is a way to go:
In [104]: df['title'].apply(lambda s: s[s.rfind('-') + 1:].strip())
Out[104]:
0 Pharyngitis
1 Nephropathy
2 Metastatic Liver Cancer
Name: title, dtype: object
Upvotes: 3
Reputation: 512
Make a function which does the work of returning the value and then just apply it to that column.
import pandas as pd
s0 = '34 years old woman with pain in her XXX - Pharyngitis'
s1 = '67 years old man with xxx - yyy zzz - Nephropathy'
s2 = 'Metastatic Liver Cancer'
def f(x):
return x.split(" - ")[-1]
df = pd.DataFrame([s0, s1, s2], columns=['title'])
df['diagnosis'] = df['title'].apply(f)
print(df['diagnosis'])
Upvotes: 1
Reputation: 1765
You can use apply
and lambda
here:
s0 = '34 years old woman with pain in her XXX - Pharyngitis'
s1 = '67 years old man with xxx - yyy zzz - Nephropathy'
s2 = 'Metastatic Liver Cancer'
df = pd.DataFrame([s0, s1, s2], columns=['title'])
df['diagnosis'] = df['title'].apply(lambda x: x.split(' - ')[-1])
print(df['diagnosis'])
Prints:
0 Pharyngitis
1 Nephropathy
2 Metastatic Liver Cancer
Name: diagnosis, dtype: object
If you like an empty string if there is no -
in the string, change the line to:
df['diagnosis'] = df['title'].apply(lambda x: x.split(' - ')[-1] if ' - ' in x else '')
Upvotes: 1