SimonC
SimonC

Reputation: 53

Pandas - Compare value current year to those of all previous years by group and return True if it is the Cumulative Minimum

I have a Pandas dataframe with the following structure

id    date         num        
243   2014-12-01   3
234   2014-12-01   2
243   2015-12-01   2
234   2016-12-01   4
243   2016-12-01   6
234   2017-12-01   5
243   2018-12-01   7
234   2018-12-01   10
243   2019-12-01   1
234   2019-12-01   12
243   2020-12-01   15
234   2020-12-01   5

I want to add another column that compares the field num by id if it is smaller than any value in previous years (for each id). For example, id 243 and date 2019-12-01 has value 1. In this case the new field flag will assume True because no value in previous years were smaller for the id 243. The expected dataframe should look like the one below:

id    date         num  flag         
243   2014-12-01   3      -
234   2014-12-01   2      -
243   2015-12-01   2      True
234   2016-12-01   4      False
243   2016-12-01   6      False
234   2017-12-01   5      False
243   2018-12-01   7      False
234   2018-12-01   10      False
243   2019-12-01   1      True
234   2019-12-01   12      False
243   2020-12-01   15      False
234   2020-12-01   5      False

I am stuck in finding a solution that allows me to compare each row to those of previous years. Any suggestion how to compare each row value to those in years before?

Thanks

Upvotes: 4

Views: 360

Answers (2)

David Erickson
David Erickson

Reputation: 16683

  1. Use .cummin to get the cumulative minimum by group
  2. Use .cumcount to return the first value of each group as - with np.where

df['flag'] = (df['num'] == df.groupby(['id'])['num'].transform('cummin'))
df['flag'] = np.where(df.groupby('id').cumcount() == 0, '-', df['flag'])
df
Out[1]: 
     id       date  num   flag
0   243 2014-12-01    3      -
1   234 2014-12-01    2      -
2   243 2015-12-01    2   True
3   234 2016-12-01    4  False
4   243 2016-12-01    6  False
5   234 2017-12-01    5  False
6   243 2018-12-01    7  False
7   234 2018-12-01   10  False
8   243 2019-12-01    1   True
9   234 2019-12-01   12  False
10  243 2020-12-01   15  False
11  234 2020-12-01    5  False

Minor note: Instead of np.where(), you can also use:

df['flag'] = df['flag'].where(df.groupby('id').cumcount() != 0, '-')

which essentially does the exact same thing.

In one of line of code:

(df.num == df.groupby('id').num.cummin()).where(df.groupby('id').cumcount() != 0, '-')

Upvotes: 1

BENY
BENY

Reputation: 323276

Let us use cummin + duplicated with where

(df['num']==df.groupby('id')['num'].cummin()).where(df.id.duplicated(),'-')
0         -
1         -
2      True
3     False
4     False
5     False
6     False
7     False
8      True
9     False
10    False
11    False
Name: num, dtype: object

Upvotes: 0

Related Questions