Pandas groupby transform to get not null date value

Question

I have a dataframe constructed as so:

df = pd.DataFrame({'id': [1,2,3,4,1,2,3,4],
                   'birthdate': ['01-01-01','02-02-02','03-03-03','04-04-04',
                                 '','02-02-02','03-04-04','04-03-04']})
df['birthdate'] = pd.to_datetime(df['birthdate'])

I want to do a groupby to change the original data using pandas .transform

The condition is that I want to pick the birthdate value of the first not null row per id

I know I can do max if no other option is available to get rid of the not null entries, but if there are inconsistencies, I don't necessarily want the maximum date, just the one that occurs first in the dataframe.

As such:

df['birthdate'] = df.groupby('id')['birthdate'].transform(max)

This is how output looks using max:

id  birthdate
0   1 2001-01-01
1   2 2002-02-02
2   3 2003-03-03
3   4 2004-04-04
4   1 2001-01-01
5   2 2002-02-02
6   3 2004-03-04
7   4 2004-04-04

This is how I actually want it to look:

id  birthdate
0   1 2001-01-01
1   2 2002-02-02
2   3 2003-03-03
3   4 2004-04-04
4   1 2001-01-01
5   2 2002-02-02
6   3 2003-03-03
7   4 2004-04-04

I'm pretty sure I have to create a customer lambda to put inside the .transform but I am unsure what condition to use.

Josmoor98 · Accepted Answer

You can try the following. Your dataframe definition and suggested outputs contain different dates, so I assumed your dataframe definition was correct

df['birthdate'] = df.groupby('id').transform('first')

which outputs.

   id   birthdate
0   1   2001-01-01
1   2   2002-02-02
2   3   2003-03-03
3   4   2004-04-04
4   1   2001-01-01
5   2   2002-02-02
6   3   2003-03-03
7   4   2004-04-04

Pandas groupby transform to get not null date value

Answers (1)

Related Questions