How to write a return value of a function into new column of a pandas dataframe

Question

I have a pandas dataframe containing a column with strings (that are comma separated substrings). I want to remove some of the substrings and write the remaining ones to a new column in the same dataframe.

The code I have written to do this looks like this:

def remove_betas(df):
    for index,row in df.iterrows():
        list= row['Column'].split(',')
        if 'substring' in list:
            list.remove('beta-lactam')
            New= (',').join(list)
        elif not 'substring' in list:
            New= (',').join(Gene_list)
    return New
    df['NewColumn'].iloc[index]=New






 df.apply(remove_betas, axis=1)

When I run it, my new column contains only zeros. The thought behind this code is to get each string for each row in df, split it at comma into substrings and search the resulting list for the substring I want to remove. After removal, I join the list back together into a string and write that to a new column of df, at the same index position as the corresponding row.

What do I have to change to write the resulting substrings to a new column in the desired manner?

EDIT

By the way, I have tried to write a lambda expression as in how to compute a new column based on the values of other columns in pandas - python , but I cannot really figure out how to do everything in a vectorized function.

I also tried replacing the substring with nothing ( as in df.column.replace('x,?', ''), but that does not work since I have to count the lists later. Therefore the substring must be removed as in list.remove('substring')

Colonel Beauvel · Accepted Answer

Why not employing a one liner regex solution:

import re

df = pd.DataFrame({'col1':[3,4,5],'col2':['a,ben,c','a,r,ben','cat,dog'],'col3':[1,2,3]})

#In [220]: df
#Out[220]:
#   col1     col2  col3
#0     3  a,ben,c     1
#1     4  a,r,ben     2
#2     5  cat,dog     3

df['new'] = df.col2.apply(lambda x: re.sub(',?ben|ben,?', '', x))

#In [222]: df
#Out[222]:
#   col1     col2  col3      new
#0     3  a,ben,c     1      a,c
#1     4  a,r,ben     2      a,r
#2     5  cat,dog     3  cat,dog

Or just use replace:

In [272]: df.col2.str.replace(',?ben|ben,?', '',case=False)
Out[272]:
0        a,c
1        a,r
2    cat,dog
Name: col2, dtype: object

How to write a return value of a function into new column of a pandas dataframe

Answers (1)

Related Questions