Reputation: 1485
I have a dataFrame that looks as such:
Date Yearly_cost
2009-01-01 230
2010-03-03 260
2009-01-01 320
2007-03-02 430
The same dataFrame contains multiple duplicate values for Date but different values for Yearly_cost. I want to groupby Date so that I have a consistent time series with all corresponding values for each day. However I want it to return a df rather than a groupby object.
The desired result would look as such:
Date Yearly_cost
2007-03-02 430
2009-01-01 230, 320
2010-03-03 260
Any help would be appreciated
Upvotes: 0
Views: 54
Reputation: 71560
To answer the revised question, use:
df.groupby('Date')['Yearly_cost'].apply(list).reset_index(name='Yearly_cost')
If you want to change e.g. [320]
to 320
, do:
df.groupby('Date')['Yearly_cost'].apply(list).apply(lambda x: x[0] if len(x) == 1 else x).reset_index(name='Yearly_cost')
Upvotes: 1
Reputation: 2016
Say you have the following df:
df1 = pd.DataFrame({'Date': ['2009-01-01', '2009-01-01', '2010-03-03' , '2010-03-03', '2004-04-03' ,'2007-03-02'],
'Yearly_cost': [230 ,460, 260, 250, 320 ,430],})
df1
df1
Date Yearly_cost
0 2009-01-01 230
1 2009-01-01 460
2 2010-03-03 260
3 2010-03-03 250
4 2004-04-03 320
5 2007-03-02 430
You can do the following:
df1['Yearly_cost'] = df1.Yearly_cost.astype(str)
df1.groupby('Date')['Yearly_cost'].apply(','.join)
Result:
Date
2004-04-03 320
2007-03-02 430
2009-01-01 230,460
2010-03-03 260,250
Name: Yearly_cost, dtype: object
In order to make the above a dataframe:
df1['Yearly_cost'] = df1.Yearly_cost.astype(str)
combined = df1.groupby('Date')['Yearly_cost'].apply(','.join)
pd.DataFrame(combined)
And you'll get:
Yearly_cost
Date
2004-04-03 320
2007-03-02 430
2009-01-01 230,460
2010-03-03 260,250
Upvotes: 1