Reputation: 407
Suppose I have two pandas dataframes:
In [1]: dates = pd.date_range('20170101',periods=6)
df1 = pd.DataFrame(np.empty([len(dates),2]),index=dates,columns=['foo','bar'])
df1['foo'].loc[0:2] = 'A'
df1['bar'].loc[0:3] = 'A'
df1['foo'].loc[2:6] = 'B'
df1['bar'].loc[3:6] = 'B'
df2 = pd.DataFrame(np.random.randint(10,size=(6,2)),index=dates,columns=df1.columns)
print(df1)
print(df2)
Out [1]:
foo bar
2017-01-01 A A
2017-01-02 A A
2017-01-03 B A
2017-01-04 B B
2017-01-05 B B
2017-01-06 B B
foo bar
2017-01-01 5 3
2017-01-02 6 9
2017-01-03 5 9
2017-01-04 7 5
2017-01-05 0 2
2017-01-06 0 0
I'm interested in a creating a 3rd df populated with the max of df2 based on df1. For example, the output would look something like this:
foo bar
2017-01-01 6 9
2017-01-02 6 9
2017-01-03 7 9
2017-01-04 7 5
2017-01-05 7 5
2017-01-06 7 5
Surely there's a succinct way to do this, right?
Upvotes: 1
Views: 1463
Reputation: 294508
You can add the column values of df1
to df1
to get unique groups for which to groupby
df2.stack().groupby(
df1.add(df1.columns.to_series()).stack()
).transform('max').unstack()
foo bar
2017-01-01 6 9
2017-01-02 6 9
2017-01-03 7 9
2017-01-04 7 5
2017-01-05 7 5
2017-01-06 7 5
To account for non strings
df2.stack().groupby(
df1.astype(str).add(df1.columns.to_series().astype(str)).stack()
).transform('max').unstack()
Upvotes: 0
Reputation: 215107
One option is to concatenate the two data frames and assign a key to each one, transform the resulting data frame to long format, and then calculate the max grouped by the key and the column names:
(pd.concat([df1, df2], keys=["one", "two"], axis=1)
.stack(level=1).groupby(level=1)
.apply(lambda g: g.groupby("one",as_index=False)["two"].transform("max"))
.two.unstack(level=1))
Upvotes: 2