FunnyChef
FunnyChef

Reputation: 1946

Update pandas DataFrame based on a different DataFrame

I have two pandas DataFrames:

df1

key    id    count
100    9821   7
200    9813   10

df2

nodekey    nodeid    
100        9821   
200        9813  

If the nodekey+nodeid in df2 match key+id in df1, count in df1 has to be set to 0. So, the result of the example above should be;

key    id    count
100    9821   0
200    9813   0

I tried the following (matching on key and nodekey only, as a test) but receive an error:

df1['count']=np.where((df1.key == df2.nodekey),0)

ValueError: either both or neither of x and y should be given

Suggestions?

Upvotes: 1

Views: 62

Answers (2)

DYZ
DYZ

Reputation: 57033

Merge the dataframes using the left merge (the rows that are present in df1 but not in df2 will be filled with nans):

combined = df1.merge(df2, left_on=['key', 'id'],
                         right_on=['nodekey', 'nodeid'], how='left')

Update the counts for the rows that are non-nan:

combined.loc[combined.nodekey.notnull(), 'count'] = 0

Cleanup the unwanted columns:

combined.drop(['nodekey', 'nodeid'], axis=1, inplace=True)
#   key    id  count
#0  100  9821      0
#1  200  9813      0
#2  300  9855      7

Upvotes: 0

rafaelc
rafaelc

Reputation: 59264

This should work

df1.loc[df1[['key', 'id']].transform(tuple,1).isin(df2[['nodekey', 'nodeid']].transform(tuple,1)), "count"] = 0

which is basically using

df.loc[mask, 'count']=0 

where mask is True for rows where tuple ('key', 'id') matches any tuple ('nodekey', 'nodeid')

Upvotes: 2

Related Questions