verkter
verkter

Reputation: 778

Pandas - Reshape / Transform Dataframe with Multiple Columns into a Single Column of values

I have a pandas dataframe with years as columns, countries as row names:

Country       | 1960 | 1961 | 1962 | 1963
-----------------------------------------
United States | 1000 | 2000 | 3000 | 4000
-----------------------------------------
Argentina     | 1000 | 2000 | 3000 | 4000
-----------------------------------------

I would like to transform it into:

Country       | Year | Value
-----------------------------
Unites States | 1960 | 1000
Unites States | 1961 | 2000
Unites States | 1962 | 3000
Unites States | 1963 | 4000
Argentina     | 1960 | 1000
Argentina     | 1961 | 2000
Argentina     | 1962 | 3000
Argentina     | 1963 | 4000

I am not sure what split, sort or group operations need to be applied to achieve this goal.

Thank you!

Upvotes: 2

Views: 1001

Answers (3)

wflynny
wflynny

Reputation: 18521

Just to give a complete example,

In [1]: df = pd.DataFrame([['United States', 1000, 2000, 3000, 4000],
                           ['Argentina', 1000, 2000, 3000, 4000]],
                          columns=['Country', 1960, 1961, 1962, 1963])

In [2]: df.set_index('Country', inplace=True)
In [3]: df = df.stack().reset_index()
In [4]: df.columns = ['Country', 'Year', 'Value']

Yields

         Country  Year  Value
0  United States  1960   1000
1  United States  1961   2000
2  United States  1962   3000
3  United States  1963   4000
4      Argentina  1960   1000
5      Argentina  1961   2000
6      Argentina  1962   3000
7      Argentina  1963   4000

To get rid of the index column and use the Country column as the index, you can use

In [3]: df = df.stack().reset_index(1)
In [4]: df.columns = ['Year', 'Value']

which yields

               Year  Value
Country                   
United States  1960   1000
United States  1961   2000
United States  1962   3000
United States  1963   4000
Argentina      1960   1000
Argentina      1961   2000
Argentina      1962   3000
Argentina      1963   4000

Upvotes: 3

ej_f
ej_f

Reputation: 460

You can use the stack method:

>>> df=pd.DataFrame({"country":["United States","Argentina"],
1960:[1000,1000],
1961:[2000,2000],
1962:[3000,3000],
1963:[4000,4000]} )
>>> df
   1960  1961        country  1963  1962
0  1000  2000  United States  4000  3000
1  1000  2000      Argentina  4000  3000
>>> df.set_index("country").stack()
country
United States  1960    1000
               1961    2000
               1963    4000
               1962    3000
Argentina      1960    1000
               1961    2000
               1963    4000
               1962    3000
dtype: int64
>>> df.set_index("country").stack().reset_index()
         country  level_1     0
0  United States     1960  1000
1  United States     1961  2000
2  United States     1963  4000
3  United States     1962  3000
4      Argentina     1960  1000
5      Argentina     1961  2000
6      Argentina     1963  4000
7      Argentina     1962  3000

I hope this can help you

Upvotes: 3

divandc
divandc

Reputation: 149

This is not exactly what you wanted, but with df.stack() you can get the following:

0  Country    United States
    1960               1000
    1961               2000
    1962               3000
    1963               2300
1  Country        Argentina
    1960               1000
    1961               2000
    1962               3000
    1963               4000

Upvotes: 0

Related Questions