Vamsi Nimmala
Vamsi Nimmala

Reputation: 507

get 2nd min and max into a new column after groupby performed column

I would like to find the minimum value of the column date (pd.to_datetime format) other than "1777-07-07" which is basically outlier. The input dataframe is as shown

    col2        date
    b1a2  1777-07-07
    b1a2  2012-09-14
    b1a2  1777-07-07
    b1a2  1777-07-07
    b1a2  2017-09-14
    b1a2  2019-09-24
    b1a2  2012-09-14
    b1a2  2012-09-14
    b1a2  2012-09-28
    a1l2  1777-07-07
    a1l2  2012-09-24
    a1l2  2012-09-24
    a1l2  2002-09-28
    a1l2  2012-09-24
    a1l2  2008-09-14
    a1l2  2012-09-24

So when i tred the following

    df = df.join(df.groupby(['col2'])['date'].agg({'earliest':'min'}),on=['disability_case_id'])
    df = df.join(df.groupby(['col2'])['date'].agg({'latest':'max'}),on=['disability_case_id'])

This one gives me both max and min values as shown

    col2        date earliset   latest
    b1a2  1777-07-07 1777-07-07 2019-09-24
    b1a2  2012-09-14 1777-07-07 2019-09-24
    b1a2  2017-09-14 1777-07-07 2019-09-24
    b1a2  2019-09-24 1777-07-07 2019-09-24
    b1a2  2012-09-14 1777-07-07 2019-09-24
    b1a2  2012-09-14 1777-07-07 2019-09-24
    b1a2  2012-09-28 1777-07-07 2019-09-24
    a1l2  1777-07-07 1777-07-07 2012-09-28
    a1l2  2012-09-24 1777-07-07 2012-09-28
    a1l2  2012-09-28 1777-07-07 2012-09-28
    a1l2  2002-09-28 1777-07-07 2012-09-28
    a1l2  2012-09-24 1777-07-07 2012-09-28
    a1l2  2008-09-14 1777-07-07 2012-09-28
    a1l2  2012-09-24 1777-07-07 2012-09-28

But I want to avoid the outlier, My expected output is

b1a2  1777-07-07 2012-09-14 2019-09-24
b1a2  2012-09-14 2012-09-14 2019-09-24
b1a2  2017-09-14 2012-09-14 2019-09-24
b1a2  2019-09-24 2012-09-14 2019-09-24
b1a2  2012-09-14 2012-09-14 2019-09-24
b1a2  2012-09-14 2012-09-14 2019-09-24
b1a2  2012-09-28 2012-09-14 2019-09-24
a1l2  1777-07-07 2002-09-28 2012-09-28
a1l2  2012-09-24 2002-09-28 2012-09-28
a1l2  2012-09-28 2002-09-28 2012-09-28
a1l2  2002-09-28 2002-09-28 2012-09-28
a1l2  2012-09-24 2002-09-28 2012-09-28
a1l2  2008-09-14 2002-09-28 2012-09-28
a1l2  2012-09-24 2002-09-28 2012-09-28

Upvotes: 1

Views: 40

Answers (2)

ALollz
ALollz

Reputation: 59529

With a constant outlier, mask before the groupby. Use transform to broadcast back to the original DataFrame.

df['date'] = pd.to_datetime(df.date)

s = df.date.where(df.date.ne('1777-07-07')).groupby(df.col2)
df['earliest'] = s.transform('min')
df['latest'] = s.transform('max')

Output:

    col2       date   earliest     latest
0   b1a2 1777-07-07 2012-09-14 2019-09-24
1   b1a2 2012-09-14 2012-09-14 2019-09-24
2   b1a2 1777-07-07 2012-09-14 2019-09-24
3   b1a2 1777-07-07 2012-09-14 2019-09-24
4   b1a2 2017-09-14 2012-09-14 2019-09-24
5   b1a2 2019-09-24 2012-09-14 2019-09-24
6   b1a2 2012-09-14 2012-09-14 2019-09-24
7   b1a2 2012-09-14 2012-09-14 2019-09-24
8   b1a2 2012-09-28 2012-09-14 2019-09-24
9   a1l2 1777-07-07 2002-09-28 2012-09-24
10  a1l2 2012-09-24 2002-09-28 2012-09-24
11  a1l2 2012-09-24 2002-09-28 2012-09-24
12  a1l2 2002-09-28 2002-09-28 2012-09-24
13  a1l2 2012-09-24 2002-09-28 2012-09-24
14  a1l2 2008-09-14 2002-09-28 2012-09-24
15  a1l2 2012-09-24 2002-09-28 2012-09-24

Upvotes: 2

user3483203
user3483203

Reputation: 51175

Mask invalid values and continue as you were before.


u = df['date'].mask(df['date'].eq('1777-07-07')).groupby(df['col2']).agg(['min', 'max'])

df.merge(u, left_on='col2', right_index=True)

    col2       date        min        max
0   b1a2 1777-07-07 2012-09-14 2019-09-24
1   b1a2 2012-09-14 2012-09-14 2019-09-24
2   b1a2 1777-07-07 2012-09-14 2019-09-24
3   b1a2 1777-07-07 2012-09-14 2019-09-24
4   b1a2 2017-09-14 2012-09-14 2019-09-24
5   b1a2 2019-09-24 2012-09-14 2019-09-24
6   b1a2 2012-09-14 2012-09-14 2019-09-24
7   b1a2 2012-09-14 2012-09-14 2019-09-24
8   b1a2 2012-09-28 2012-09-14 2019-09-24
9   a1l2 1777-07-07 2002-09-28 2012-09-24
10  a1l2 2012-09-24 2002-09-28 2012-09-24
11  a1l2 2012-09-24 2002-09-28 2012-09-24
12  a1l2 2002-09-28 2002-09-28 2012-09-24
13  a1l2 2012-09-24 2002-09-28 2012-09-24
14  a1l2 2008-09-14 2002-09-28 2012-09-24
15  a1l2 2012-09-24 2002-09-28 2012-09-24

Upvotes: 1

Related Questions