Reputation: 173
I have a pandas dateframe of two columns ['company'] which is a string and ['publication_datetime'] which is a datetime.
I want to group by company and the publication_date , adding a new column with the maximum publication_datetime for each record.
so far i have tried:
issuers = news[['company','publication_datetime']]
issuers['publication_date'] = issuers['publication_datetime'].dt.date
issuers['publication_datetime_max'] = issuers.groupby(['company','publication_date'], as_index=False)['publication_datetime'].max()
my group by does not appear to work.
i get the following error
ValueError: Wrong number of items passed 3, placement implies 1
Upvotes: 1
Views: 3253
Reputation: 4251
You need the transform()
method to cast the result in the original dimension of the dataframe.
issuers['max'] = issuers.groupby(['company', 'publication_date'])['publication_datetime'].transform('max')
The result of your groupby()
before was returning a multi-indexed group object, which is why it's complaining about 3 values (first group, second group, and then values). But even if you just returned the values, it's combining like groups together, so you'll have fewer values than needed.
The transform()
method returns the group results for each row of the dataframe in a way that makes it easy to create a new column. The returned values are an indexed Series
with the indices being the original ones from the issuers
dataframe.
Hope this helps! Documentation for transform here
Upvotes: 1
Reputation: 445
The thing is by doing what you are doing you are trying to set a DataFrame
to a column value.
Doing the following will get extract only the values without the two indexe columns:
issuers['publication_datetime_max'] = issuers.groupby(['company','publication_date'], as_index=False)['publication_datetime'].max().tolist()
Hope this help !
Upvotes: 0