Reputation: 2568
I have the following dataframe:
> df.head(7)
id user_id date_created_status date_created_user
24 10 2015-02-25 17:01:21-03:00 2015-02-25 17:00:27-03:00
26 1 2015-02-26 00:18:10-03:00 2015-02-23 16:37:58-03:00
29 9 2015-02-28 07:23:53-03:00 2015-02-25 16:12:11-03:00
30 7 2015-03-03 03:22:45-03:00 2015-02-24 01:24:08-03:00
31 7 2015-03-03 03:24:53-03:00 2015-02-24 01:24:08-03:00
38 13 2015-03-04 19:11:16-03:00 2015-03-04 19:09:27-03:00
39 14 2015-03-04 19:19:16-03:00 2015-03-04 19:17:47-03:00
After sorting and grouping the dataframe (getting the first element) the date_created_status
and date_created_user
columns change their date format.
> df.sort_values('date_created_status', inplace=True)
> df = df.groupby('user_id', as_index=False).first()
id user_id date_created_status date_created_user
1 26 2015-02-26T03:18:10.000000000 2015-02-23T19:37:58.000000000
2 46352 2016-01-22 15:50:40.516000-02:00 2015-02-23 16:37:58-03:00
4 62 2015-03-10 17:14:27-03:00 2015-02-23 16:37:58-03:00
7 30 2015-03-03 03:22:45-03:00 2015-02-24 01:24:08-03:00
8 3274 2015-06-16 18:37:29.056000-03:00 2015-02-24 15:30:02-03:00
9 29 2015-02-28 07:23:53-03:00 2015-02-25 16:12:11-03:00
10 24 2015-02-25 17:01:21-03:00 2015-02-25 17:00:27-03:00
12 1223 2015-05-05 09:39:26.530000-03:00 2015-02-27 14:43:10-03:00
If I try calling strftime
function in any of these datetime columns, I get an error:
> df['signup_period'] = df.date_created_user.apply(lambda x: x.strftime('%Y-%m'))
...
AttributeError: 'numpy.datetime64' object has no attribute 'strftime'
How can I sort and group these rows without "breaking" the datetime?
Upvotes: 0
Views: 311
Reputation: 323306
You can using head
here
df.sort_values('date_created_status', inplace=True)
df = df.groupby('user_id', as_index=False).head(1)
Or drop_duplicated
df.sort_values('date_created_status', inplace=True)
df=df.drop_duplicates('user_id',keep='first')
Upvotes: 1