Marc
Marc

Reputation: 167

Remove string from pandas column dependent on another column

I have an example dataframe:

      col1                                   col2  
0     Hello, is it me you're looking for     Hello   
1     Hello, is it me you're looking for     me 
2     Hello, is it me you're looking for     looking 
3     Hello, is it me you're looking for     for   
4     Hello, is it me you're looking for     Lionel  
5     Hello, is it me you're looking for     Richie   

I would like to change col1 so that it removed the string in col2, and return the ammended dataframe. I would also like to remove the characters 1 before and 1 after the string, for example, the desired output for index 1 would be:

      col 1                                   col 2
1     Hello, is ityou're looking for          me

I have tried using pd.apply(), pd.map() with a .replace() function, but I can't get the .replace() to use pd.['col2'] as an argument. I also feel as if it isn't the best way to go about it.

Any help? I'm mostly new to pandas and am looking to learn, so please ELI5.

Thanks!

Upvotes: 5

Views: 3531

Answers (3)

mucktruckpluckduck
mucktruckpluckduck

Reputation: 1

Perhaps there is a more pythonic or elegant way, but here is how I quickly did above. This will work best if you don't have you need flexibility to manipulate the strings and where speed to fix is more important than performance.

I took out the columns of dataframe as two individual series

col1Series = df['col1']
col2Series = df['col2']

Next create an empty list to store final string value:

rowxList = []

Iterate as follows to populate the list:

for x,y in zip(col1Series,col2Series):
    rowx  = x.replace(y,'')
    rowxList.append(rowx)

Last, put the rowxList back in the original dataframe as a new column. You can replace the old column. It's safer to do that under a new column and check the output against the original two columns and then remove the old column you no longer need:

df['newCol'] = rowxList

Upvotes: 0

SCKU
SCKU

Reputation: 833

Do some function for each row in dataframe can use:

df.apply(func, axis=1)

func will get each row as series as argument.

df['col1'] = df.apply(lambda row: row['col1'].replace(row['col2'],''))

However, removing one character before and after needs more work.

so define func:

def func(row):
    c1 = row['col1'] #string col1
    c2 = row['col2'] #string col2
    find_index = c1.find(c2) #first find c2 index from left
    if find_index == -1: # not find
        return c1 #not change
    else:
        start_index = max(find_index - 1, 0) #1 before but not negative
        end_index = find_index + len(c2) +1 #1 after, python will handle index overflow
        return c1.replace(c1[start_index:end_index], '') #remove

then:

df['col1'] = df.apply(func, axis=1)

*to avoid copy warning, use:

df = df.assign(col1=df.apply(func, axis=1))

Upvotes: 4

Magellan88
Magellan88

Reputation: 2573

My guess is, that you were missing the "axis=1" so the apply works not on the column but on the row

A = """Hello, is it me you're looking for;Hello
Hello, is it me you're looking for;me
Hello, is it me you're looking for;looking
Hello, is it me you're looking for;for
Hello, is it me you're looking for;Lionel
Hello, is it me you're looking for;Richie
"""
df = pd.DataFrame([a.split(";") for a in A.split("\n") ][:-1],
                   columns=["col1","col2"])

df.col1 = df.apply( lambda x: x.col1.replace( x.col2, "" )  , axis=1)

Upvotes: 3

Related Questions