Reputation: 407

Mask based on two different Pandas Dataframes?

Suppose I have two pandas dataframes:

In [1]: dates = pd.date_range('20170101',periods=6)
df1 = pd.DataFrame(np.empty([len(dates),2]),index=dates,columns=['foo','bar'])
df1['foo'].loc[0:2] = 'A'
df1['bar'].loc[0:3] = 'A'
df1['foo'].loc[2:6] = 'B'
df1['bar'].loc[3:6] = 'B'
df2 = pd.DataFrame(np.random.randint(10,size=(6,2)),index=dates,columns=df1.columns)
print(df1)
print(df2)

Out [1]:
           foo bar
2017-01-01   A   A
2017-01-02   A   A
2017-01-03   B   A
2017-01-04   B   B
2017-01-05   B   B
2017-01-06   B   B 

            foo  bar
2017-01-01    5    3
2017-01-02    6    9
2017-01-03    5    9
2017-01-04    7    5
2017-01-05    0    2
2017-01-06    0    0

I'm interested in a creating a 3rd df populated with the max of df2 based on df1. For example, the output would look something like this:

            foo  bar
2017-01-01    6    9
2017-01-02    6    9
2017-01-03    7    9
2017-01-04    7    5
2017-01-05    7    5
2017-01-06    7    5

Surely there's a succinct way to do this, right?

Upvotes: 1

Answers (2)

piRSquared

Reputation: 294508

You can add the column values of df1 to df1 to get unique groups for which to groupby

df2.stack().groupby(
    df1.add(df1.columns.to_series()).stack()
).transform('max').unstack()

            foo  bar
2017-01-01    6    9
2017-01-02    6    9
2017-01-03    7    9
2017-01-04    7    5
2017-01-05    7    5
2017-01-06    7    5

To account for non strings

df2.stack().groupby(
    df1.astype(str).add(df1.columns.to_series().astype(str)).stack()
).transform('max').unstack()

Upvotes: 0

akuiper

Reputation: 215107

One option is to concatenate the two data frames and assign a key to each one, transform the resulting data frame to long format, and then calculate the max grouped by the key and the column names:

(pd.concat([df1, df2], keys=["one", "two"], axis=1)
 .stack(level=1).groupby(level=1)
 .apply(lambda g: g.groupby("one",as_index=False)["two"].transform("max"))
 .two.unstack(level=1))

Upvotes: 2

Mask based on two different Pandas Dataframes?

Answers (2)

Related Questions