phaebz
phaebz

Reputation: 423

Pandas broadcast across primary label of MultiIndex

Given a DataFrame A with MultiIndex and a DataFrame B with one-dimensional index, how to update column values of A with new values from B where the index of B should be matched with the first index label of A. The values of A should be "broadcast" across the lower-level index labels.

Test data:

begin = [10, 10, 12, 12, 14, 14]
end = [10, 11, 12, 13, 14, 15]
values = [1, 2, 3, 4, 5, 6]

mi = pd.DataFrame({'begin': begin,
                   'end': end,
                   'value': values})
mi.set_index(['begin', 'end'], inplace=True)

si = pd.DataFrame.from_dict(dict(zip([10, 12],
                                     [10, 20])), orient='index')
si.columns = ['value']

And the desired result should be

            value
begin end       
10    10       10
      11       10
12    12       20
      13       20
14    14       5
      15       6

I was trying to come up with a solution based on Pandas DataFrame.update with MultiIndex label, but to no avail so far because of fewer values in DataFrame B (si).

Upvotes: 1

Views: 371

Answers (2)

creanion
creanion

Reputation: 2743

You can use broadcasting like this:

_, si_broadcast = mi.align(si.rename_axis(index="begin"))
mi_updated = mi.mask(~pd.isna(si_broadcast), si_broadcast)
  1. rename the index in si to begin so that it matches the index level it corresponds to in mi
  2. Use align for explicit broadcasting
  3. Create an update that takes the value from si if it exists, else takes from mi using mask.

Upvotes: 0

knagaev
knagaev

Reputation: 2957

I don't know how my decision is good enough but please try it :)

mi.update(pd.merge(mi.reset_index(), si.reset_index(), left_on='begin', right_on='index', suffixes=('_x', '')).set_index(['begin', 'end']))

Upvotes: 1

Related Questions