Reputation: 322
I've got a pandas dataframe that has a multiindex of Month, then Day, and then a single column of data values, looks something like this:
Data
1 1 6
2 10
3 11
4 12
5 50
...
12 1 3
2 4
3 10
4 11
You get the point. I've groupby'd the month and date. I want to make a new column that formats the Month-Date together. But cant seem to figure it out. Desired output would look something like this:
M-D Data
1-01 6
1-02 10
1-03 11
1-04 14
1-05 50
Can anyone offer an optimal solution for this? Note: I created the index earlier using the following code, perhaps I need to go back and do some restructuring? I made this a multiindex so that I can find the min() values. The original 'Month' and 'Day' columns had Months and Days over a span of 10 years and I only needed the minimum of those 10 years hence the groupby().
dframe['Date'] = pd.to_datetime(dframe['Date'])
dframe['Month'], dframe['Day'] = dframe['Date'].dt.month,
dframe['Date'].dt.day
finaldf = dframe.groupby(['Month','Day'])['Data_Value'].min()
df = pd.DataFrame(finaldf)
Thanks in advance!
Upvotes: 1
Views: 1878
Reputation: 294576
I'd take @Psidom's advice. However, to solve the question that was asked:
I'd use format
df.assign(**{'M-D': df.index.map('{0[0]}-{0[1]}'.format).rename('M-D')})
Data M-D
1 1 6 1-1
2 10 1-2
3 11 1-3
4 12 1-4
5 50 1-5
12 1 3 12-1
2 4 12-2
3 10 12-3
4 11 12-4
Setup
df = pd.DataFrame(dict(
Data=[6, 10, 11, 12, 50, 3, 4, 10, 11],
), [[1] * 5 + [12] * 4, list(range(1, 6)) + list(range(1, 5))])
Upvotes: 2
Reputation: 433
It's a bit tricky to work with multiIndex, I'm not sure that this is very time efficient, but it works:
df["M-D"] = [str(df.index[i][0])+'-'+str(df.index[i][1]) for i in range(len(df))]
It gives the following result:
In[143]: df.head()
Out[143]:
Open New_date
Month Day
1 1 9597 1-1
2 9370 1-2
3 9365 1-3
4 9766 1-4
5 9446 1-5
Upvotes: 0
Reputation: 215137
If you need month and day to be formatted as m-d
, then not to separate them from the beginning is a more reasonable approach, for instance, you can format Date column as %m-%d
, group by this variable and do aggregations you need:
dframe.groupby(pd.to_datetime(dframe['Date']).dt.strftime('%m-%d'))['Data_Value'].min()
A short example:
dframe = pd.DataFrame({
'Date': ['2017-08-01', '2016-08-01', '2017-08-02'],
'Data_Value': [2,3,4]
})
dframe.groupby(pd.to_datetime(dframe['Date']).dt.strftime('%m-%d'))['Data_Value'].min()
#Date
#08-01 2
#08-02 4
#Name: Data_Value, dtype: int64
Upvotes: 3