Josh Friedlander
Josh Friedlander

Reputation: 11657

Get subset of Pandas df where multiple columns match values from another df

I have two dataframes with multi-indexes looking like this:

df1

pd.DataFrame({'observation': {('foo', '2017-04-16'): 'green',
  ('bar', '2017-04-25'): 'red',
  ('zap', '2017-04-16'): 'red',
  ('zip', '2017-04-25'): 'blue',
  ('zip', '2017-04-16'): 'white'},
 'observation': {('zap', '2017-04-16'): np.nan,
  ('bar', '2017-04-27'): 'white',
  ('foo', '2017-05-16'): np.nan,
  ('foo', '2017-04-25'): 'red',
  ('zip', '2017-08-16'): 'red'}})

df2

pd.DataFrame({'foo': {('00', '08'): '0.0',
  ('01', '08'): '0.0',
  ('01', '08'): '0.0',
  ('00', '08'): '1.0',
  ('03', '08'): '1.0',
  ('06', '08'): '0.0',
  ('00', '08'): '1.0',
  ('00', '08'): '1.0',
  ('00', '08'): '0.0',
  ('02', '08'): '0.0'},
 'client_id': {('00', '08'): '1.0',
  ('01', '08'): '1.0',
  ('01', '08'): '1.0',
  ('00', '08'): '1.0',
  ('03', '08'): '1.0',
  ('06', '08'): '1.0',
  ('00', '08'): '1.0',
  ('00', '08'): '1.0',
  ('00', '08'): '1.0',
  ('02', '08'): '1.0'},
 'execution_date': {('00', '08'): '2019-01-09',
  ('01', '08'): '2019-01-09',
  ('01', '08'): '2019-01-09',
  ('00', '08'): '2019-01-09',
  ('03', '08'): '2019-01-09',
  ('06', '08'): '2019-01-09',
  ('00', '08'): '2019-01-09',
  ('00', '08'): '2019-01-09',
  ('00', '08'): '2019-01-09',
  ('02', '08'): '2019-01-09'},
 'del': {('00', '08'): '0.0',
  ('01', '08'): '0.0',
  ('01', '08'): '0.0',
  ('00', '08'): '0.0',
  ('03', '08'): '0.0',
  ('06', '08'): '0.0',
  ('00', '08'): '0.0',
  ('00', '08'): '0.0',
  ('00', '08'): '0.0',
  ('02', '08'): '0.0'},
 'act': {('00', '08'): '11',
  ('01', '08'): '03',
  ('01', '08'): '06',
  ('00', '08'): '07',
  ('03', '08'): '07',
  ('06', '08'): '11',
  ('00', '08'): '28',
  ('00', '08'): '08',
  ('00', '08'): '14',
  ('02', '08'): '26'},
 'obs': {('00', '08'): '02',
  ('01', '08'): '02',
  ('01', '08'): '02',
  ('00', '08'): '02',
  ('03', '08'): '02',
  ('06', '08'): '02',
  ('00', '08'): '02',
  ('00', '08'): '02',
  ('00', '08'): '02',
  ('02', '08'): '02'}})

The two are not the same size, and the values don't always overlap, but every index pair found in df1 is in df2. What I would like to do is update the observation col in the df1 with the values of observation in df2, wherever it matches.

In other words, I would like to do the equivalent of an inner join based on the multi-index, and then overwrite the values in observation in df1 with those from df2. But is there a way to do this in one step, using loc/indexing? (This is structured as an index problem, but if there is a way to solve it using reset_index() that would be fine too.)

Desired output:

        obs
00  04  30
    08  02
    09  16
    10  26
    16  26
01  01  30
    07  16
02  08  02
03  13  26
07  15  26

Upvotes: 0

Views: 232

Answers (1)

Dani Mesejo
Dani Mesejo

Reputation: 61910

If I understood correctly, you could do:

df2 = pd.DataFrame({'observation': {('foo', '2017-04-16'): 'green',
  ('bar', '2017-04-25'): 'red',
  ('zap', '2017-04-16'): 'red',
  ('zip', '2017-04-25'): 'blue',
  ('zip', '2017-04-16'): 'white'},
 'observation': {('zap', '2017-04-16'): 'yellow',
  ('bar', '2017-04-27'): 'white',
  ('foo', '2017-05-16'): 'black',
  ('foo', '2017-04-25'): 'red',
  ('zip', '2017-08-16'): 'red'}})

df['observation'] = df.index.map(dict(zip(df2.index, df2.observation)))

Output

               observation
bar 2017-04-27       white
foo 2017-04-25         red
    2017-05-16       black
zap 2017-04-16      yellow
zip 2017-08-16         red

Upvotes: 1

Related Questions