Reputation: 367
I have pandas DataFrame like this:
data = {'ID_1':['A', 'A','A', 'B', 'B', 'B'],
'ID_2':[1, 2, 2, 1, 1, 2],
'DATE':['2021-11-21', '2021-12-19', '2021-09-05', '2021-11-07', '2021-12-05','2021-12-26'],
'VALUE': [0.5, 0.5, 0.5, 0.6, 0.6, 0.6]}
df = pd.DataFrame(data)
And I would like to leave value in column 'VALUE'
only for lowest date from column 'DATE'
for subset of 'ID_1'
and 'ID_2'
Desired output look like this:
data = {'ID_1':['A', 'A','A', 'B', 'B', 'B'],
'ID_2':[1, 2, 2, 1, 1, 2],
'DATE':['2021-11-21', '2021-12-19', '2021-09-05', '2021-11-07', '2021-12-05','2021-12-26'],
'VALUE': [0.5, np.NaN, 0.5, 0.6, np.NaN, 0.6]}
df = pd.DataFrame(data)
What I tried is to create function which twice grouping this dataframe but I'm ending with ValueError Length of values (2) does not match length of index (1)
My function:
def foo(val):
def add_mask(val):
val.reset_index(inplace=True)
min_date = val['DATE'].min()
mask = val.DATE == min_date
return val[mask]
return val.groupby('ID_1').apply(add_mask)
test = df.groupby('ID_2').apply(foo)
Upvotes: 3
Views: 1378
Reputation: 29397
Just another way to do it:
df['DATE'] = df.groupby(['ID_1','ID_2']).DATE.transform(lambda x: (x==min(x))*x).replace('', np.NaN)
Using the boolean (x==min(x))
:
df['is_min'] = df.groupby(['ID_1','ID_2']).DATE.transform(lambda x: x==min(x))
#
# ID_1 ID_2 DATE VALUE is_min
# 0 A 1 2021-11-21 0.5 True
# 1 A 2 2021-12-19 0.5 False
# 2 A 2 2021-09-05 0.5 True
# 3 B 1 2021-11-07 0.6 True
# 4 B 1 2021-12-05 0.6 False
# 5 B 2 2021-12-26 0.6 True
Upvotes: 0
Reputation:
You can groupby
"ID_1" and "ID_2" and transform the min
of "DATE" for each group for the DataFrame. Then use eq
to identify the rows where the group mins exist. Finally, use where
to assign NaN values to "VALUE"s that are not min:
df['VALUE'] = df['VALUE'].where(df.groupby(['ID_1','ID_2'])['DATE'].transform('min').eq(df['DATE']))
Output:
ID_1 ID_2 DATE VALUE
0 A 1 2021-11-21 0.5
1 A 2 2021-12-19 NaN
2 A 2 2021-09-05 0.5
3 B 1 2021-11-07 0.6
4 B 1 2021-12-05 NaN
5 B 2 2021-12-26 0.6
Function foo
doesn't work because you never use mask
you create in it to modify "VALUE" in each group. If you replace
return val[mask]
with
val['VALUE'] = val['VALUE'].where(mask)
return val
it will produce the expected outcome (you'll need to fix the index but the general structure will be what you expect).
Upvotes: 2
Reputation: 3936
Many elegant answers, but here is how I will go about it;
grp = df.groupby(["ID_1", "ID_2"])
grp
def change(df):
df.loc[df.DATE != df.DATE.min(), 'VALUE'] = np.nan
return df
grp.apply(change)
Results in:
ID_1 ID_2 DATE VALUE
0 A 1 2021-11-21 0.5
1 A 2 2021-12-19 NaN
2 A 2 2021-09-05 0.5
3 B 1 2021-11-07 0.6
4 B 1 2021-12-05 NaN
5 B 2 2021-12-26 0.6
Upvotes: 0
Reputation: 262124
You can groupby
your two ID columns and compute the min with transform('min')
, then mask
the data where the value is not equal:
df['VALUE'] = df['VALUE'].mask(df['DATE'].ne(df.groupby(['ID_1', 'ID_2'])['DATE'].transform('min')))
output:
ID_1 ID_2 DATE VALUE
0 A 1 2021-11-21 0.5
1 A 2 2021-12-19 NaN
2 A 2 2021-09-05 0.5
3 B 1 2021-11-07 0.6
4 B 1 2021-12-05 NaN
5 B 2 2021-12-26 0.6
Upvotes: 4