Reputation: 2507
I would like to group values by a particular column (id's) and replace all values withe the max datetime associated with a given id.
This is the code I have written (doesn't work)
file.groupby('data__id')['data__answered_at'].apply(lambda x: x['data__answered_at'] == x['data__answered_at'].max())
This is a sample of my dataframe
data__id data__answered_at
1 2019-01-10
1 Na
2 2019-01-12
2 Na
3 Na
4 Na
4 Na
5 Na
5 2019-01-15
Upvotes: 2
Views: 125
Reputation: 863541
Use to_datetime
with errors='coerce'
for replace non datetimes to NaT
, then get maximum value per groups with GroupBy.transform
, so possible replace missing values by Series.fillna
:
df['data__answered_at'] = pd.to_datetime(df['data__answered_at'], errors='coerce')
s = df.groupby('data__id')['data__answered_at'].transform('max')
df['data__answered_at'] = df['data__answered_at'].fillna(s)
print (df)
data__id data__answered_at
0 1 2019-01-10
1 1 2019-01-10
2 2 2019-01-12
3 2 2019-01-12
4 3 NaT
5 4 NaT
6 4 NaT
7 5 2019-01-15
8 5 2019-01-15
Your solution should be rewritten with lambda function and fillna
:
f = lambda x: x.fillna(x.max())
df['data__answered_at'] = df.groupby('data__id')['data__answered_at'].apply(f)
Upvotes: 1