user2110417
user2110417

Reputation:

How to remove strings from a column matching with strings of another column of dataframe?

I have to two dataframes first one: df

df1 = pd.DataFrame({
    'Sample': ['Sam1', 'Sam2', 'Sam3'],
    'Value': ['ak,b,c,k', 'd,k,e,b,f,a', 'am, x,y,z,a']
})

df1

looks as:

    Sample  Value
0   Sam1    ak,b,c,k
1   Sam2    d,k,e,b,f,a
2   Sam3    am,x,y,z,a

second one: df2

df2 = pd.DataFrame({
    'Remove': ['ak', 'b', 'k', 'a', 'am']})
df2

Looks as:

    Remove
0   ak
1   b
2   k
3   a
4   am

I want to remove the strings from df1['Value'] that are matching with df2['Remove']

Expected output is:

Sample    Value
Sam1      c
Sam2      d,e,f
Sam3      x,y,z

This code did not help me

Any help, thanks

Upvotes: 2

Views: 474

Answers (3)

azibom
azibom

Reputation: 1934

This script will help you

for index, elements in enumerate(df1['Value']):
    elements = elements.split(',')
    df1['Value'][index] = list(set(elements)-set(df2['Remove']))

Just iterate the data frame and get the diff of array with the remove array like this
The complete code will be sth like this

import pandas as pd

df1 = pd.DataFrame({
    'Sample': ['Sam1', 'Sam2', 'Sam3'],
    'Value': ['ak,b,c,k', 'd,k,e,b,f,a', 'am,x,y,z,a']
})

df2 = pd.DataFrame({
    'Remove': ['ak', 'b', 'k', 'a', 'am']})

for index, elements in enumerate(df1['Value']):
    elements = elements.split(',')
    df1['Value'][index] = list(set(elements)-set(df2['Remove']))

print(df1)

output

  Sample      Value
0   Sam1        [c]
1   Sam2  [e, d, f]
2   Sam3  [y, x, z]

Upvotes: 0

Rishabh Kumar
Rishabh Kumar

Reputation: 2430

Using apply as a 1 liner

df1['Value'] = df1['Value'].str.split(',').apply(lambda x:','.join([i for i in x if i not in df2['Remove'].values]))

Output:

>>> df1
  Sample   Value
0   Sam1       c
1   Sam2   d,e,f
2   Sam3   x,y,z

Upvotes: 0

Ynjxsjmh
Ynjxsjmh

Reputation: 30022

You can use apply() to remove items in df1 Value column if it is in df2 Remove column.

import pandas as pd

df1 = pd.DataFrame({
    'Sample': ['Sam1', 'Sam2', 'Sam3'],
    'Value': ['ak,b,c,k', 'd,k,e,b,f,a', 'am, x,y,z,a']
})

df2 = pd.DataFrame({'Remove': ['ak', 'b', 'k', 'a', 'am']})

remove_list = df2['Remove'].values.tolist()

def remove_value(row, remove_list):
    keep_list = [val for val in row['Value'].split(',') if val not in remove_list]

    return ','.join(keep_list)

df1['Value'] = df1.apply(remove_value, axis=1, args=(remove_list,))

print(df1)
  Sample   Value
0   Sam1       c
1   Sam2   d,e,f
2   Sam3   x,y,z

Upvotes: 0

Related Questions