shadowtalker
shadowtalker

Reputation: 13883

Pandas, assign and broadcast along one multi-index level

import numpy as np
import pandas as pd

np_rng = np.random.default_rng()

df = pd.DataFrame({
    'v1': ['a', 'b', 'b', 'b', 'a', 'a', 'c', 'c'],
    'v2': [1, 1, 1, 2, 1, 2, 1, 2],
    'val': np_rng.uniform(size=8),
})

g1 = df.groupby(['v1', 'v2'])[['val']].mean()
g2 = df.groupby('v1')[['val']].mean()

Is it possible to assign the column g2['val'] to g1, broadcasting over the v1 index level of g1, but not g2?

This is the desired result:

            val     val_m
v1 v2
a  1   0.574011  0.609789
   2   0.681344  0.609789
b  1   0.653014  0.719828
   2   0.853454  0.719828
c  1   0.279289  0.528449
   2   0.777608  0.528449

Upvotes: 1

Views: 571

Answers (1)

shadowtalker
shadowtalker

Reputation: 13883

This is precisely what DataFrame.join is for! By default, it uses the data frame index to join, and it has no trouble joining on a partial index. Don't forget: broadcasting along an index is just a left join on the index.

g3 = g1.join(g2.rename(columns={'val': 'val_m'}))
print(g3)

This returns the desired output:

001 |             val     val_m
002 | v1 v2                    
003 | a  1   0.563699  0.376579
004 |    2   0.002338  0.376579
005 | b  1   0.776224  0.659765
006 |    2   0.426846  0.659765
007 | c  1   0.053235  0.495630
008 |    2   0.938024  0.495630

Upvotes: 2

Related Questions