Reshape Pandas Dataframe

Question

Say I have the following dataframe where I have counts of some variable at 2 different seasons in 2 different years and in 3 different locations. The data is currently structured where each row is a Season / Location combination and there are columns for the count in each year. It looks something like this:

>>> df=pd.DataFrame([['Summer', 'A', 1, 2],
                     ['Winter', 'A', 3, 4],
                     ['Summer', 'B', 5, 6],
                     ['Winter', 'B', 7, 8],
                     ['Summer', 'C', 9, 10],
                     ['Winter', 'C', 11, 12]], 
                    columns=['Season', 'Location', 'Count_2014', 'Count_2015'])
>>> df
   Season Location  Count_2014  Count_2015
0  Summer        A           1           2
1  Winter        A           3           4
2  Summer        B           5           6
3  Winter        B           7           8
4  Summer        C           9          10
5  Winter        C          11          12

I want to restructure the data so that I have a row for each Season, Location, and Year combination (meaning I would have 2 x 3 x 2 = 12 rows). My current approach to doing so is certainly not the most efficient (see below). Any suggestions on the best way to restructure this dataset?

df.set_index(['Season', 'Location'], inplace=True)
ListOfDFs = []
for Year in [x[-4:] for x in df.columns]:
    SubD = df[['Count_' + Year]]
    SubD.columns = ['Count']
    SubD['Year'] = Year
    SubD.set_index('Year', append=True, inplace=True)
    ListOfDFs.append(SubD)

df2=pd.concat(ListOfDFs)
>>> df2
                      Count
Season Location Year       
Summer A        2014      1
Winter A        2014      3
Summer B        2014      5
Winter B        2014      7
Summer C        2014      9
Winter C        2014     11
Summer A        2015      2
Winter A        2015      4
Summer B        2015      6
Winter B        2015      8
Summer C        2015     10
Winter C        2015     12

chappers · Accepted Answer

You are looking for the melt functionality, which will allow you to do this in basically one line:

df_new = pd.melt(df,id_vars=['Season', 'Location'], value_vars=['Count_2014', 'Count_2015'], 
        var_name='Year', 
        value_name='Count')

Then you can use apply (or there probably is something better), to get the output you have above:

df_new['Year'] = df_new['Year'].apply(lambda x: x[-4:])

Output:

    Season Location  Year  Count
0   Summer        A  2014      1
1   Winter        A  2014      3
2   Summer        B  2014      5
3   Winter        B  2014      7
4   Summer        C  2014      9
5   Winter        C  2014     11
6   Summer        A  2015      2
7   Winter        A  2015      4
8   Summer        B  2015      6
9   Winter        B  2015      8
10  Summer        C  2015     10
11  Winter        C  2015     12

Reshape Pandas Dataframe

Answers (2)

Related Questions