Tree
Tree

Reputation: 31371

How to group a dataframe by date to get an array of ids for each group?

Here is my dataframe:

id - title - publish_up - date

1  - Exampl- 2019-12-1  - datetime

...

I created a date column by applying

df['date'] = pd.to_datetime(df['publish_up'], format='%Y-%m-%d')

I am new in python and I am trying to learn pandas. What I would like to do is to create groups for each day of the year.

The dataframe contains data from one year span, so in theory, there should be 365 groups.

Then, I would need to get an array of ids for each group.

example:

[{date:'2019-12-1',ids:[1,2,3,4,5,6]},{date:'2019-12-2',ids:[7,8,9,10,11,12,13,14]},...]

Thank you

Upvotes: 1

Views: 637

Answers (1)

jezrael
jezrael

Reputation: 863166

If want format dates in strings in output list then convert to datetimes is not necessary, only create lists per groups by GroupBy.apply, convert it to DataFrame by DataFrame.reset_index and last create list of dicts by DataFrame.to_dict:

print (df)
   id   title publish_up      date
0   1  Exampl  2019-12-2  datetime
1   2  Exampl  2019-12-2  datetime
2   2  Exampl  2019-12-1  datetime

#if necessary change format 2019-12-1 to 2019-12-01
#df['publish_up'] = pd.to_datetime(df['publish_up'], format='%Y-%m-%d').dt.strftime('%Y-%m-%d')

print (df.groupby('publish_up')['id'].agg(list).reset_index())
  publish_up      id
0  2019-12-1     [2]
1  2019-12-2  [1, 2]

a = df.groupby('publish_up')['id'].agg(list).reset_index().to_dict('r')
print (a)
[{'publish_up': '2019-12-1', 'id': [2]}, {'publish_up': '2019-12-2', 'id': [1, 2]}]

Upvotes: 3

Related Questions