Reputation:
I have to two dataframes
first one: df
df1 = pd.DataFrame({
'Sample': ['Sam1', 'Sam2', 'Sam3'],
'Value': ['ak,b,c,k', 'd,k,e,b,f,a', 'am, x,y,z,a']
})
df1
looks as:
Sample Value
0 Sam1 ak,b,c,k
1 Sam2 d,k,e,b,f,a
2 Sam3 am,x,y,z,a
second one: df2
df2 = pd.DataFrame({
'Remove': ['ak', 'b', 'k', 'a', 'am']})
df2
Looks as:
Remove
0 ak
1 b
2 k
3 a
4 am
I want to remove the strings from df1['Value']
that are matching with df2['Remove']
Expected output is:
Sample Value
Sam1 c
Sam2 d,e,f
Sam3 x,y,z
This code did not help me
Any help, thanks
Upvotes: 2
Views: 474
Reputation: 1934
This script will help you
for index, elements in enumerate(df1['Value']):
elements = elements.split(',')
df1['Value'][index] = list(set(elements)-set(df2['Remove']))
Just iterate the data frame and get the diff of array with the remove array like this
The complete code will be sth like this
import pandas as pd
df1 = pd.DataFrame({
'Sample': ['Sam1', 'Sam2', 'Sam3'],
'Value': ['ak,b,c,k', 'd,k,e,b,f,a', 'am,x,y,z,a']
})
df2 = pd.DataFrame({
'Remove': ['ak', 'b', 'k', 'a', 'am']})
for index, elements in enumerate(df1['Value']):
elements = elements.split(',')
df1['Value'][index] = list(set(elements)-set(df2['Remove']))
print(df1)
output
Sample Value
0 Sam1 [c]
1 Sam2 [e, d, f]
2 Sam3 [y, x, z]
Upvotes: 0
Reputation: 2430
Using apply
as a 1 liner
df1['Value'] = df1['Value'].str.split(',').apply(lambda x:','.join([i for i in x if i not in df2['Remove'].values]))
Output:
>>> df1
Sample Value
0 Sam1 c
1 Sam2 d,e,f
2 Sam3 x,y,z
Upvotes: 0
Reputation: 30022
You can use apply()
to remove items in df1 Value
column if it is in df2 Remove
column.
import pandas as pd
df1 = pd.DataFrame({
'Sample': ['Sam1', 'Sam2', 'Sam3'],
'Value': ['ak,b,c,k', 'd,k,e,b,f,a', 'am, x,y,z,a']
})
df2 = pd.DataFrame({'Remove': ['ak', 'b', 'k', 'a', 'am']})
remove_list = df2['Remove'].values.tolist()
def remove_value(row, remove_list):
keep_list = [val for val in row['Value'].split(',') if val not in remove_list]
return ','.join(keep_list)
df1['Value'] = df1.apply(remove_value, axis=1, args=(remove_list,))
print(df1)
Sample Value
0 Sam1 c
1 Sam2 d,e,f
2 Sam3 x,y,z
Upvotes: 0