user5813190
user5813190

Reputation:

pandas compaare dataframe values and update column

I have two dataframes. Where column from 1st DF contains values that are present in 2nd DF as list i.e. c2 ['test1', 'test2']

1.

inp = [{'c1':'test1'}, {'c1': 'test2'}, {'c1':'test3'}]
df1 = pd.DataFrame(inp)
print (df1)

Output:

      c1
0  test1
1  test2
2  test3
import pandas as pd
inp = [{'c1':10, 'c2':['test1', 'teest2'], 'test1':'', 'teest2':''}, {'c1':11,'c2':teest2}, {'c1':12,'c2':120}, {'test1':''}, {'teest2':''}]
df2 = pd.DataFrame(inp)
print (df2)

Output:

    c1               c2 test1 teest2
0  10.0  [test1, teest2]   NaN    NaN      
1  11.0           teest2   NaN    NaN
2  12.0              120   NaN    NaN
3   NaN              NaN   Nan    NaN

I would like to check : if the values in df1 (i generated a list of all the values from column). are present in DF2 - c2 i.e. ['test1', 'test2']. then update the matched column name with 'yes' or 'no'.

    c1               c2 test1 teest2
0  10.0  [test1, teest2]   yes   no        
1  11.0       ['teest2']   No    yes
2  12.0              120   No    No
3   NaN              NaN   No    no

Upvotes: 1

Views: 51

Answers (1)

jezrael
jezrael

Reputation: 862681

Create DataFrame from lists with generate scalars to one element lists and compare by DataFrame.isin, then create array yes/no in numpy.where and assign to columns:

L = [x if isinstance(x, list) else [x] for x in df2['c2']]
mask = pd.DataFrame(L).isin(df1['c1'])

df2[['test1','teest2']] = np.where(mask, 'yes','no')
print (df2)
     c1               c2 test1 teest2
0  10.0  [test1, teest2]   yes     no
1  11.0              110    no     no
2  12.0              120    no     no
3   NaN              NaN    no     no
4   NaN              NaN    no     no

If possible multiple values in lists - 3, 4 .. N create DataFrame constructor and add to original columns:

L = [x if isinstance(x, list) else [x] for x in df2['c2']]

mask = pd.DataFrame(L).isin(df1['c1'])

arr = np.where(mask, 'yes','no')

df2 = df2[['c1','c2']].join(pd.DataFrame(arr, index=df2.index).add_prefix('test'))
print (df2)
     c1               c2 test0 test1
0  10.0  [test1, teest2]   yes    no
1  11.0              110    no    no
2  12.0              120    no    no
3   NaN              NaN    no    no
4   NaN              NaN    no    no

Upvotes: 2

Related Questions