Reputation: 53
I have a Pandas dataframe with the following structure
id date num
243 2014-12-01 3
234 2014-12-01 2
243 2015-12-01 2
234 2016-12-01 4
243 2016-12-01 6
234 2017-12-01 5
243 2018-12-01 7
234 2018-12-01 10
243 2019-12-01 1
234 2019-12-01 12
243 2020-12-01 15
234 2020-12-01 5
I want to add another column that compares the field num by id if it is smaller than any value in previous years (for each id). For example, id 243 and date 2019-12-01 has value 1. In this case the new field flag will assume True because no value in previous years were smaller for the id 243. The expected dataframe should look like the one below:
id date num flag
243 2014-12-01 3 -
234 2014-12-01 2 -
243 2015-12-01 2 True
234 2016-12-01 4 False
243 2016-12-01 6 False
234 2017-12-01 5 False
243 2018-12-01 7 False
234 2018-12-01 10 False
243 2019-12-01 1 True
234 2019-12-01 12 False
243 2020-12-01 15 False
234 2020-12-01 5 False
I am stuck in finding a solution that allows me to compare each row to those of previous years. Any suggestion how to compare each row value to those in years before?
Thanks
Upvotes: 4
Views: 360
Reputation: 16683
.cummin
to get the cumulative minimum by group.cumcount
to return the first value of each group as -
with np.where
df['flag'] = (df['num'] == df.groupby(['id'])['num'].transform('cummin'))
df['flag'] = np.where(df.groupby('id').cumcount() == 0, '-', df['flag'])
df
Out[1]:
id date num flag
0 243 2014-12-01 3 -
1 234 2014-12-01 2 -
2 243 2015-12-01 2 True
3 234 2016-12-01 4 False
4 243 2016-12-01 6 False
5 234 2017-12-01 5 False
6 243 2018-12-01 7 False
7 234 2018-12-01 10 False
8 243 2019-12-01 1 True
9 234 2019-12-01 12 False
10 243 2020-12-01 15 False
11 234 2020-12-01 5 False
Minor note: Instead of np.where()
, you can also use:
df['flag'] = df['flag'].where(df.groupby('id').cumcount() != 0, '-')
which essentially does the exact same thing.
In one of line of code:
(df.num == df.groupby('id').num.cummin()).where(df.groupby('id').cumcount() != 0, '-')
Upvotes: 1
Reputation: 323276
Let us use cummin
+ duplicated
with where
(df['num']==df.groupby('id')['num'].cummin()).where(df.id.duplicated(),'-')
0 -
1 -
2 True
3 False
4 False
5 False
6 False
7 False
8 True
9 False
10 False
11 False
Name: num, dtype: object
Upvotes: 0