John Taylor
John Taylor

Reputation: 737

pandas reshape only two columns

Given this sample dataframe:

Provider    Month    Active   Acute   Hospital   Total
Joe         January  0        1       0          1
Mary        February 1        3       5          9
Susie       January  2        2       4          8
Susie       January  3        0       0          3
Mary        February 4        3       3          10
Joe         February 1        0       4          5

I am trying to reshape the data but only reshaping the Month and Total columns. I have tried pivot, pivot table, unstack and melt, but no combo seems to work. The desired dataframe output is below with the "Active" value totaled under the month column name which is taken from the Month column and all the other values under their column headers as in the above df:

Provider    January  Acute   Hospital   Total   February  Acute  Hospital  Total
Joe         0        1       0          1       1         0      4         5
Susie       5        2       4          11      0         0      0         0
Mary        0        0       0          0       5         6      8         19

So, the "Active" values from the above dataframe now appear under the Month Name and the remainder stay where they are. Ideas? I am sure it is some combo of various reshaping tools that will get me there, but I haven't been able to make that connection just yet.

Upvotes: 1

Views: 196

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 31011

My first thought is that you should have a MultiIndex on columns of the result. The first level should be month name and the second - names of your source columns with integer data. See column names in the final result.

Start from creation of a list of months:

months = ['January', 'February']

If you have more months in the source DataFrame, extend the above list so that it includes all existing months.

Then create an auxiliary DataFrame - a pivot_table from the source DataFrame:

wrk = df.pivot_table(index='Provider', columns='Month', aggfunc='sum',
    fill_value=0)

For now it has wrong order of levels in columns, so the next step is to reverse it:

wrk.columns = wrk.columns.swaplevel()

And to get the intended result, reindex wrk:

result = wrk.reindex(columns=pd.MultiIndex.from_product([
    months, ['Active', 'Acute', 'Hospital', 'Total']]), fill_value=0)

I added fill_value in the last instruction, to have a month filled with zeroes if your source data failed to include data from this month. Try e.g. to add 'March' to months and you will see the result.

For your source data, the result is:

         January                      February                     
          Active Acute Hospital Total   Active Acute Hospital Total
Provider                                                           
Joe            0     1        0     1        1     0        4     5
Mary           0     0        0     0        5     6        8    19
Susie          5     2        4    11        0     0        0     0

Note that the order of rows is ascending and in my opinion the result should be just like above (your expected result contains rows placed in an arbitrary order).

Upvotes: 0

mozway
mozway

Reputation: 261880

I would suggest to use a pivot_table with a MultiIndex:

df2 = (    df.pivot_table(index='Provider', columns='Month', fill_value=0, aggfunc='sum', sort=False)
   .swaplevel(axis=1)
   .sort_index(axis=1, level='Month', sort_remaining=False)
 )

Output:

Month    February                      January                     
           Active Acute Hospital Total  Active Acute Hospital Total
Provider                                                           
Joe             1     0        4     5       0     1        0     1
Mary            5     6        8    19       0     0        0     0
Susie           0     0        0     0       5     2        4    11

If you insist on having the provided format, you can rework the column headers (but will have ambiguous duplicated names):

df2.columns = df2.columns.map(lambda x: x[0] if x[1]=='Active' else x[1])

Output:

          February  Acute  Hospital  Total  January  Acute  Hospital  Total
Provider                                                                   
Joe              1      0         4      5        0      1         0      1
Mary             5      6         8     19        0      0         0      0
Susie            0      0         0      0        5      2         4     11

Upvotes: 3

Related Questions