Alexandre Lara
Alexandre Lara

Reputation: 2568

Datetime columns changes after groupby in Pandas

I have the following dataframe:

> df.head(7)
id  user_id     date_created_status         date_created_user
24  10          2015-02-25 17:01:21-03:00   2015-02-25 17:00:27-03:00
26  1           2015-02-26 00:18:10-03:00   2015-02-23 16:37:58-03:00
29  9           2015-02-28 07:23:53-03:00   2015-02-25 16:12:11-03:00
30  7           2015-03-03 03:22:45-03:00   2015-02-24 01:24:08-03:00
31  7           2015-03-03 03:24:53-03:00   2015-02-24 01:24:08-03:00
38  13          2015-03-04 19:11:16-03:00   2015-03-04 19:09:27-03:00
39  14          2015-03-04 19:19:16-03:00   2015-03-04 19:17:47-03:00

After sorting and grouping the dataframe (getting the first element) the date_created_status and date_created_user columns change their date format.

> df.sort_values('date_created_status', inplace=True)
> df = df.groupby('user_id', as_index=False).first()
id  user_id  date_created_status               date_created_user
1   26       2015-02-26T03:18:10.000000000     2015-02-23T19:37:58.000000000
2   46352    2016-01-22 15:50:40.516000-02:00  2015-02-23 16:37:58-03:00
4   62       2015-03-10 17:14:27-03:00         2015-02-23 16:37:58-03:00
7   30       2015-03-03 03:22:45-03:00         2015-02-24 01:24:08-03:00
8   3274     2015-06-16 18:37:29.056000-03:00  2015-02-24 15:30:02-03:00
9   29       2015-02-28 07:23:53-03:00         2015-02-25 16:12:11-03:00
10  24       2015-02-25 17:01:21-03:00         2015-02-25 17:00:27-03:00
12  1223     2015-05-05 09:39:26.530000-03:00  2015-02-27 14:43:10-03:00

If I try calling strftime function in any of these datetime columns, I get an error:

> df['signup_period'] = df.date_created_user.apply(lambda x: x.strftime('%Y-%m'))
...
AttributeError: 'numpy.datetime64' object has no attribute 'strftime'

How can I sort and group these rows without "breaking" the datetime?

Upvotes: 0

Views: 311

Answers (1)

BENY
BENY

Reputation: 323306

You can using head here

df.sort_values('date_created_status', inplace=True)
df = df.groupby('user_id', as_index=False).head(1)

Or drop_duplicated

df.sort_values('date_created_status', inplace=True)
df=df.drop_duplicates('user_id',keep='first')

Upvotes: 1

Related Questions