Reputation: 464
I have a dataframe df
that looks like this:
PO SO Date Name Qty
0 123 34 2020-01-05 Carl 5
1 111 55 2020-10-10 Beth 7
2 123 12 2020-02-03 Greg 11
3 101 55 2019-12-03 Carl 3
4 123 34 2020-11-30 Beth 24
5 111 55 2019-04-02 Greg 6
6 202 99 2020-05-06 Beth 19
What I would like to do is replace dates with the minimum date for the dataframe when grouped by PO
and SO
. For instance, there are two rows with a PO
of '123' and an SO
of '34'. Since the minimum Date
among these rows is '2020-01-05', both rows should have their Date
column set to '2020-01-05'.
Thus the result would looks like this:
PO SO Date Name Qty
0 123 34 2020-01-05 Carl 5
1 111 55 2019-04-02 Beth 7
2 123 12 2020-02-03 Greg 11
3 101 55 2019-12-03 Carl 3
4 123 34 2020-01-05 Beth 24
5 111 55 2019-04-02 Greg 6
6 202 99 2020-05-06 Beth 19
Upvotes: 1
Views: 90
Reputation: 16683
You can use transform
with groupby
to create a "calculated column", so that you can avoid a messy merge
:
df = pd.DataFrame({'PO': [123, 111, 123, 101, 123, 111, 202],
'SO': [34, 55, 12, 55, 34, 55, 99],
'Date': ['2020-01-05', '2020-10-10', '2020-02-03', '2019-12-03', '2020-11-30', '2019-04-02', '2020-05-06'],
'Name': ['Carl', 'Beth', 'Greg', 'Carl', 'Beth', 'Greg', 'Beth'],
'Qty': [5, 7, 11, 3, 24, 6, 19]})
df_grouped = df.copy()
df_grouped['Date'] = df_grouped.groupby(['PO', 'SO'])['Date'].transform('min')
df_grouped
Out[1]:
PO SO Date Name Qty
0 123 34 2020-01-05 Carl 5
1 111 55 2019-04-02 Beth 7
2 123 12 2020-02-03 Greg 11
3 101 55 2019-12-03 Carl 3
4 123 34 2020-01-05 Beth 24
5 111 55 2019-04-02 Greg 6
6 202 99 2020-05-06 Beth 19
Upvotes: 2
Reputation: 464
In order to accomplish this, we will create a key using PO
, SO
, and the minimum Date
for each combination of PO
and SO
. We use groupby
with min
to accomplish this.
import pandas as pd
df = pd.DataFrame({'PO': [123, 111, 123, 101, 123, 111, 202],
'SO': [34, 55, 12, 55, 34, 55, 99],
'Date': ['2020-01-05', '2020-10-10', '2020-02-03', '2019-12-03', '2020-11-30', '2019-04-02', '2020-05-06'],
'Name': ['Carl', 'Beth', 'Greg', 'Carl', 'Beth', 'Greg', 'Beth'],
'Qty': [5, 7, 11, 3, 24, 6, 19]})
df_grouped = df[['PO', 'SO', 'Date']].groupby(by=['PO', 'SO'], as_index=False, dropna=False).min()
print(df_grouped)
PO SO Date
0 101 55 2019-12-03
1 111 55 2019-04-02
2 123 12 2020-02-03
3 123 34 2020-01-05
4 202 99 2020-05-06
Now we can merge this with the original dataframe, replacing the old Date
column with the Date
column from df_grouped
.
df = pd.merge(df.drop(columns=['Date']), df_grouped, on=['PO', 'SO'])
df = df[['PO', 'SO', 'Date', 'Name', 'Qty']] # reset column order
print(df)
PO SO Date Name Qty
0 123 34 2020-01-05 Carl 5
1 123 34 2020-01-05 Beth 24
2 111 55 2019-04-02 Beth 7
3 111 55 2019-04-02 Greg 6
4 123 12 2020-02-03 Greg 11
5 101 55 2019-12-03 Carl 3
6 202 99 2020-05-06 Beth 19
Upvotes: 0