ahh_real_numbers
ahh_real_numbers

Reputation: 464

Apply results of pandas groupby to multiple rows

I have a dataframe df that looks like this:

    PO  SO        Date  Name  Qty
0  123  34  2020-01-05  Carl    5
1  111  55  2020-10-10  Beth    7
2  123  12  2020-02-03  Greg   11
3  101  55  2019-12-03  Carl    3
4  123  34  2020-11-30  Beth   24
5  111  55  2019-04-02  Greg    6
6  202  99  2020-05-06  Beth   19

What I would like to do is replace dates with the minimum date for the dataframe when grouped by PO and SO. For instance, there are two rows with a PO of '123' and an SO of '34'. Since the minimum Date among these rows is '2020-01-05', both rows should have their Date column set to '2020-01-05'.

Thus the result would looks like this:

    PO  SO        Date  Name  Qty
0  123  34  2020-01-05  Carl    5
1  111  55  2019-04-02  Beth    7
2  123  12  2020-02-03  Greg   11
3  101  55  2019-12-03  Carl    3
4  123  34  2020-01-05  Beth   24
5  111  55  2019-04-02  Greg    6
6  202  99  2020-05-06  Beth   19

Upvotes: 1

Views: 90

Answers (2)

David Erickson
David Erickson

Reputation: 16683

You can use transform with groupby to create a "calculated column", so that you can avoid a messy merge:

df = pd.DataFrame({'PO':  [123, 111, 123, 101, 123, 111, 202], 
                  'SO':   [34, 55, 12, 55, 34, 55, 99], 
                  'Date': ['2020-01-05', '2020-10-10', '2020-02-03', '2019-12-03', '2020-11-30', '2019-04-02', '2020-05-06'], 
                  'Name': ['Carl', 'Beth', 'Greg', 'Carl', 'Beth', 'Greg', 'Beth'], 
                  'Qty':  [5, 7, 11, 3, 24, 6, 19]})

df_grouped = df.copy()
df_grouped['Date'] = df_grouped.groupby(['PO', 'SO'])['Date'].transform('min')
df_grouped
Out[1]: 
    PO  SO        Date  Name  Qty
0  123  34  2020-01-05  Carl    5
1  111  55  2019-04-02  Beth    7
2  123  12  2020-02-03  Greg   11
3  101  55  2019-12-03  Carl    3
4  123  34  2020-01-05  Beth   24
5  111  55  2019-04-02  Greg    6
6  202  99  2020-05-06  Beth   19

Upvotes: 2

ahh_real_numbers
ahh_real_numbers

Reputation: 464

In order to accomplish this, we will create a key using PO, SO, and the minimum Date for each combination of PO and SO. We use groupby with min to accomplish this.

import pandas as pd

df = pd.DataFrame({'PO':  [123, 111, 123, 101, 123, 111, 202], 
                  'SO':   [34, 55, 12, 55, 34, 55, 99], 
                  'Date': ['2020-01-05', '2020-10-10', '2020-02-03', '2019-12-03', '2020-11-30', '2019-04-02', '2020-05-06'], 
                  'Name': ['Carl', 'Beth', 'Greg', 'Carl', 'Beth', 'Greg', 'Beth'], 
                  'Qty':  [5, 7, 11, 3, 24, 6, 19]})

df_grouped = df[['PO', 'SO', 'Date']].groupby(by=['PO', 'SO'], as_index=False, dropna=False).min()

print(df_grouped)

    PO  SO        Date
0  101  55  2019-12-03
1  111  55  2019-04-02
2  123  12  2020-02-03
3  123  34  2020-01-05
4  202  99  2020-05-06

Now we can merge this with the original dataframe, replacing the old Date column with the Date column from df_grouped.

df = pd.merge(df.drop(columns=['Date']), df_grouped, on=['PO', 'SO']) 

df = df[['PO', 'SO', 'Date', 'Name', 'Qty']] # reset column order

print(df)

    PO  SO        Date  Name  Qty
0  123  34  2020-01-05  Carl    5
1  123  34  2020-01-05  Beth   24
2  111  55  2019-04-02  Beth    7
3  111  55  2019-04-02  Greg    6
4  123  12  2020-02-03  Greg   11
5  101  55  2019-12-03  Carl    3
6  202  99  2020-05-06  Beth   19

Upvotes: 0

Related Questions