TisButaScratch
TisButaScratch

Reputation: 173

Group by multiple columns creating new column in pandas dataframe

I have a pandas dateframe of two columns ['company'] which is a string and ['publication_datetime'] which is a datetime.

I want to group by company and the publication_date , adding a new column with the maximum publication_datetime for each record.

so far i have tried:

issuers = news[['company','publication_datetime']]
issuers['publication_date'] = issuers['publication_datetime'].dt.date
issuers['publication_datetime_max'] = issuers.groupby(['company','publication_date'], as_index=False)['publication_datetime'].max() 

my group by does not appear to work.

i get the following error

ValueError: Wrong number of items passed 3, placement implies 1

Upvotes: 1

Views: 3253

Answers (2)

economy
economy

Reputation: 4251

You need the transform() method to cast the result in the original dimension of the dataframe.

issuers['max'] = issuers.groupby(['company', 'publication_date'])['publication_datetime'].transform('max')

The result of your groupby() before was returning a multi-indexed group object, which is why it's complaining about 3 values (first group, second group, and then values). But even if you just returned the values, it's combining like groups together, so you'll have fewer values than needed.

The transform() method returns the group results for each row of the dataframe in a way that makes it easy to create a new column. The returned values are an indexed Series with the indices being the original ones from the issuers dataframe.

Hope this helps! Documentation for transform here

Upvotes: 1

Jules
Jules

Reputation: 445

The thing is by doing what you are doing you are trying to set a DataFrame to a column value. Doing the following will get extract only the values without the two indexe columns:

issuers['publication_datetime_max'] = issuers.groupby(['company','publication_date'], as_index=False)['publication_datetime'].max().tolist()

Hope this help !

Upvotes: 0

Related Questions