Silent-J
Silent-J

Reputation: 322

Pandas Multiindex to Month-Day format in a single column

I've got a pandas dataframe that has a multiindex of Month, then Day, and then a single column of data values, looks something like this:

      Data
1 1     6
  2     10
  3     11
  4     12
  5     50
   ...
12 1    3
   2    4
   3    10
   4    11

You get the point. I've groupby'd the month and date. I want to make a new column that formats the Month-Date together. But cant seem to figure it out. Desired output would look something like this:

M-D Data 
1-01 6
1-02 10
1-03 11
1-04 14
1-05 50

Can anyone offer an optimal solution for this? Note: I created the index earlier using the following code, perhaps I need to go back and do some restructuring? I made this a multiindex so that I can find the min() values. The original 'Month' and 'Day' columns had Months and Days over a span of 10 years and I only needed the minimum of those 10 years hence the groupby().

dframe['Date'] = pd.to_datetime(dframe['Date'])
dframe['Month'], dframe['Day'] = dframe['Date'].dt.month, 
dframe['Date'].dt.day
finaldf = dframe.groupby(['Month','Day'])['Data_Value'].min()
df = pd.DataFrame(finaldf)

Thanks in advance!

Upvotes: 1

Views: 1878

Answers (3)

piRSquared
piRSquared

Reputation: 294576

I'd take @Psidom's advice. However, to solve the question that was asked:
I'd use format

df.assign(**{'M-D': df.index.map('{0[0]}-{0[1]}'.format).rename('M-D')})

      Data   M-D
1  1     6   1-1
   2    10   1-2
   3    11   1-3
   4    12   1-4
   5    50   1-5
12 1     3  12-1
   2     4  12-2
   3    10  12-3
   4    11  12-4

Setup

df = pd.DataFrame(dict(
    Data=[6, 10, 11, 12, 50, 3, 4, 10, 11],
), [[1] * 5 + [12] * 4, list(range(1, 6)) + list(range(1, 5))])

Upvotes: 2

Erlinska
Erlinska

Reputation: 433

It's a bit tricky to work with multiIndex, I'm not sure that this is very time efficient, but it works:

df["M-D"] = [str(df.index[i][0])+'-'+str(df.index[i][1]) for i in range(len(df))]

It gives the following result:

In[143]: df.head()
Out[143]: 
           Open New_date
Month Day               
1     1    9597      1-1
      2    9370      1-2
      3    9365      1-3
      4    9766      1-4
      5    9446      1-5

Upvotes: 0

akuiper
akuiper

Reputation: 215137

If you need month and day to be formatted as m-d, then not to separate them from the beginning is a more reasonable approach, for instance, you can format Date column as %m-%d, group by this variable and do aggregations you need:

dframe.groupby(pd.to_datetime(dframe['Date']).dt.strftime('%m-%d'))['Data_Value'].min()

A short example:

dframe = pd.DataFrame({
    'Date': ['2017-08-01', '2016-08-01', '2017-08-02'],
    'Data_Value': [2,3,4]
})

dframe.groupby(pd.to_datetime(dframe['Date']).dt.strftime('%m-%d'))['Data_Value'].min()

#Date
#08-01    2
#08-02    4
#Name: Data_Value, dtype: int64

Upvotes: 3

Related Questions