Jio
Jio

Reputation: 608

How to aggregate last n records of a group object in pandas

Is it possible to find count of nulls in last 5 records of a column of a group object in pandas and save in a new column. Lets say we have a group (00936) in a dataframe with 10 records as shown below. I don't want to count the total nulls in the column 'Obse' but only the nulls in last 5 months. And this count (i.e. 4.0) i want to fill in a new column.

        code    months  Obse
2903    00936   2019-02 4.0
2904    00936   2019-03 2.0
2905    00936   2019-04 1.0
2906    00936   2019-05 NaN
2907    00936   2019-06 NaN
2908    00936   2019-07 2.0
2909    00936   2019-08 NaN
2910    00936   2019-09 NaN
2911    00936   2019-10 NaN

I am trying to use tail function but it won't work because it returns less records than length of group.


df['count_nulls'] = df.groupby('code').tail(5).Obse.transform(lambda group: group.isnull().sum())

How can I achieve this. below is expected output for above input dataframe

        code    months  Obse count_nulls
2903    00936   2019-02 4.0  4.0
2904    00936   2019-03 NaN  4.0
2905    00936   2019-04 1.0  4.0
2906    00936   2019-05 NaN  4.0
2907    00936   2019-06 NaN  4.0
2908    00936   2019-07 2.0  4.0
2909    00936   2019-08 NaN  4.0
2910    00936   2019-09 NaN  4.0
2911    00936   2019-10 NaN  4.0

Upvotes: 2

Views: 1202

Answers (2)

BENY
BENY

Reputation: 323266

How about

df['New']=df.code.map((5-df.groupby('code').tail(5).groupby('code')['Obse'].count()))

df
Out[152]: 
      code   months  Obse  New
2903   936  2019-02   4.0    4
2904   936  2019-03   2.0    4
2905   936  2019-04   1.0    4
2906   936  2019-05   NaN    4
2907   936  2019-06   NaN    4
2908   936  2019-07   2.0    4
2909   936  2019-08   NaN    4
2910   936  2019-09   NaN    4
2911   936  2019-10   NaN    4

Upvotes: 3

Quang Hoang
Quang Hoang

Reputation: 150745

How about moving tail inside the lambda function:

df['count_nulls'] = (df.groupby(['code'])['Obse']
                       .transform(lambda x: x.tail(5).isna().sum())
                    )

Output:

      code   months  Obse  count_nulls
2903   936  2019-02   4.0          4.0
2904   936  2019-03   2.0          4.0
2905   936  2019-04   1.0          4.0
2906   936  2019-05   NaN          4.0
2907   936  2019-06   NaN          4.0
2908   936  2019-07   2.0          4.0
2909   936  2019-08   NaN          4.0
2910   936  2019-09   NaN          4.0
2911   936  2019-10   NaN          4.0

Upvotes: 5

Related Questions