Reputation: 5385
I have a pandas dataframe containing a column with strings (that are comma separated substrings). I want to remove some of the substrings and write the remaining ones to a new column in the same dataframe.
The code I have written to do this looks like this:
def remove_betas(df):
for index,row in df.iterrows():
list= row['Column'].split(',')
if 'substring' in list:
list.remove('beta-lactam')
New= (',').join(list)
elif not 'substring' in list:
New= (',').join(Gene_list)
return New
df['NewColumn'].iloc[index]=New
df.apply(remove_betas, axis=1)
When I run it, my new column contains only zeros. The thought behind this code is to get each string for each row in df, split it at comma into substrings and search the resulting list for the substring I want to remove. After removal, I join the list back together into a string and write that to a new column of df, at the same index position as the corresponding row.
What do I have to change to write the resulting substrings to a new column in the desired manner?
EDIT
By the way, I have tried to write a lambda expression as in how to compute a new column based on the values of other columns in pandas - python , but I cannot really figure out how to do everything in a vectorized function.
I also tried replacing the substring with nothing ( as in df.column.replace('x,?', '')
, but that does not work since I have to count the lists later. Therefore the substring must be removed as in list.remove('substring')
Upvotes: 1
Views: 2548
Reputation: 31171
Why not employing a one liner regex solution:
import re
df = pd.DataFrame({'col1':[3,4,5],'col2':['a,ben,c','a,r,ben','cat,dog'],'col3':[1,2,3]})
#In [220]: df
#Out[220]:
# col1 col2 col3
#0 3 a,ben,c 1
#1 4 a,r,ben 2
#2 5 cat,dog 3
df['new'] = df.col2.apply(lambda x: re.sub(',?ben|ben,?', '', x))
#In [222]: df
#Out[222]:
# col1 col2 col3 new
#0 3 a,ben,c 1 a,c
#1 4 a,r,ben 2 a,r
#2 5 cat,dog 3 cat,dog
Or just use replace
:
In [272]: df.col2.str.replace(',?ben|ben,?', '',case=False)
Out[272]:
0 a,c
1 a,r
2 cat,dog
Name: col2, dtype: object
Upvotes: 1