Emm
Emm

Reputation: 2507

Groupby and conditional replace

I would like to group values by a particular column (id's) and replace all values withe the max datetime associated with a given id.

This is the code I have written (doesn't work)

file.groupby('data__id')['data__answered_at'].apply(lambda x: x['data__answered_at'] == x['data__answered_at'].max())

This is a sample of my dataframe

data__id     data__answered_at
1              2019-01-10
1                  Na 
2              2019-01-12
2                  Na
3                  Na
4                  Na
4                  Na
5                  Na
5              2019-01-15   

Upvotes: 2

Views: 125

Answers (1)

jezrael
jezrael

Reputation: 863541

Use to_datetime with errors='coerce' for replace non datetimes to NaT, then get maximum value per groups with GroupBy.transform, so possible replace missing values by Series.fillna:

df['data__answered_at'] = pd.to_datetime(df['data__answered_at'], errors='coerce')

s = df.groupby('data__id')['data__answered_at'].transform('max')
df['data__answered_at'] = df['data__answered_at'].fillna(s)
print (df)
   data__id data__answered_at
0         1        2019-01-10
1         1        2019-01-10
2         2        2019-01-12
3         2        2019-01-12
4         3               NaT
5         4               NaT
6         4               NaT
7         5        2019-01-15
8         5        2019-01-15

Your solution should be rewritten with lambda function and fillna:

f = lambda x: x.fillna(x.max())
df['data__answered_at'] = df.groupby('data__id')['data__answered_at'].apply(f)

Upvotes: 1

Related Questions