Reputation: 13883
import numpy as np
import pandas as pd
np_rng = np.random.default_rng()
df = pd.DataFrame({
'v1': ['a', 'b', 'b', 'b', 'a', 'a', 'c', 'c'],
'v2': [1, 1, 1, 2, 1, 2, 1, 2],
'val': np_rng.uniform(size=8),
})
g1 = df.groupby(['v1', 'v2'])[['val']].mean()
g2 = df.groupby('v1')[['val']].mean()
Is it possible to assign the column g2['val']
to g1
, broadcasting over the v1
index level of g1
, but not g2
?
This is the desired result:
val val_m
v1 v2
a 1 0.574011 0.609789
2 0.681344 0.609789
b 1 0.653014 0.719828
2 0.853454 0.719828
c 1 0.279289 0.528449
2 0.777608 0.528449
Upvotes: 1
Views: 571
Reputation: 13883
This is precisely what DataFrame.join
is for! By default, it uses the data frame index to join, and it has no trouble joining on a partial index. Don't forget: broadcasting along an index is just a left join on the index.
g3 = g1.join(g2.rename(columns={'val': 'val_m'}))
print(g3)
This returns the desired output:
001 | val val_m
002 | v1 v2
003 | a 1 0.563699 0.376579
004 | 2 0.002338 0.376579
005 | b 1 0.776224 0.659765
006 | 2 0.426846 0.659765
007 | c 1 0.053235 0.495630
008 | 2 0.938024 0.495630
Upvotes: 2