Reputation:
I have two dataframes. Where column from 1st DF contains values that are present in 2nd DF as list i.e. c2 ['test1', 'test2']
1.
inp = [{'c1':'test1'}, {'c1': 'test2'}, {'c1':'test3'}]
df1 = pd.DataFrame(inp)
print (df1)
Output:
c1
0 test1
1 test2
2 test3
import pandas as pd
inp = [{'c1':10, 'c2':['test1', 'teest2'], 'test1':'', 'teest2':''}, {'c1':11,'c2':teest2}, {'c1':12,'c2':120}, {'test1':''}, {'teest2':''}]
df2 = pd.DataFrame(inp)
print (df2)
Output:
c1 c2 test1 teest2
0 10.0 [test1, teest2] NaN NaN
1 11.0 teest2 NaN NaN
2 12.0 120 NaN NaN
3 NaN NaN Nan NaN
I would like to check : if the values in df1 (i generated a list of all the values from column). are present in DF2 - c2 i.e. ['test1', 'test2']. then update the matched column name with 'yes' or 'no'.
c1 c2 test1 teest2
0 10.0 [test1, teest2] yes no
1 11.0 ['teest2'] No yes
2 12.0 120 No No
3 NaN NaN No no
Upvotes: 1
Views: 51
Reputation: 862681
Create DataFrame
from lists with generate scalars to one element lists and compare by DataFrame.isin
, then create array yes/no
in numpy.where
and assign to columns:
L = [x if isinstance(x, list) else [x] for x in df2['c2']]
mask = pd.DataFrame(L).isin(df1['c1'])
df2[['test1','teest2']] = np.where(mask, 'yes','no')
print (df2)
c1 c2 test1 teest2
0 10.0 [test1, teest2] yes no
1 11.0 110 no no
2 12.0 120 no no
3 NaN NaN no no
4 NaN NaN no no
If possible multiple values in lists - 3, 4 .. N create DataFrame
constructor and add to original columns:
L = [x if isinstance(x, list) else [x] for x in df2['c2']]
mask = pd.DataFrame(L).isin(df1['c1'])
arr = np.where(mask, 'yes','no')
df2 = df2[['c1','c2']].join(pd.DataFrame(arr, index=df2.index).add_prefix('test'))
print (df2)
c1 c2 test0 test1
0 10.0 [test1, teest2] yes no
1 11.0 110 no no
2 12.0 120 no no
3 NaN NaN no no
4 NaN NaN no no
Upvotes: 2