Reputation: 778
I have a pandas dataframe with years as columns, countries as row names:
Country | 1960 | 1961 | 1962 | 1963
-----------------------------------------
United States | 1000 | 2000 | 3000 | 4000
-----------------------------------------
Argentina | 1000 | 2000 | 3000 | 4000
-----------------------------------------
I would like to transform it into:
Country | Year | Value
-----------------------------
Unites States | 1960 | 1000
Unites States | 1961 | 2000
Unites States | 1962 | 3000
Unites States | 1963 | 4000
Argentina | 1960 | 1000
Argentina | 1961 | 2000
Argentina | 1962 | 3000
Argentina | 1963 | 4000
I am not sure what split, sort or group operations need to be applied to achieve this goal.
Thank you!
Upvotes: 2
Views: 1001
Reputation: 18521
Just to give a complete example,
In [1]: df = pd.DataFrame([['United States', 1000, 2000, 3000, 4000],
['Argentina', 1000, 2000, 3000, 4000]],
columns=['Country', 1960, 1961, 1962, 1963])
In [2]: df.set_index('Country', inplace=True)
In [3]: df = df.stack().reset_index()
In [4]: df.columns = ['Country', 'Year', 'Value']
Yields
Country Year Value
0 United States 1960 1000
1 United States 1961 2000
2 United States 1962 3000
3 United States 1963 4000
4 Argentina 1960 1000
5 Argentina 1961 2000
6 Argentina 1962 3000
7 Argentina 1963 4000
To get rid of the index column and use the Country column as the index, you can use
In [3]: df = df.stack().reset_index(1)
In [4]: df.columns = ['Year', 'Value']
which yields
Year Value
Country
United States 1960 1000
United States 1961 2000
United States 1962 3000
United States 1963 4000
Argentina 1960 1000
Argentina 1961 2000
Argentina 1962 3000
Argentina 1963 4000
Upvotes: 3
Reputation: 460
You can use the stack method:
>>> df=pd.DataFrame({"country":["United States","Argentina"],
1960:[1000,1000],
1961:[2000,2000],
1962:[3000,3000],
1963:[4000,4000]} )
>>> df
1960 1961 country 1963 1962
0 1000 2000 United States 4000 3000
1 1000 2000 Argentina 4000 3000
>>> df.set_index("country").stack()
country
United States 1960 1000
1961 2000
1963 4000
1962 3000
Argentina 1960 1000
1961 2000
1963 4000
1962 3000
dtype: int64
>>> df.set_index("country").stack().reset_index()
country level_1 0
0 United States 1960 1000
1 United States 1961 2000
2 United States 1963 4000
3 United States 1962 3000
4 Argentina 1960 1000
5 Argentina 1961 2000
6 Argentina 1963 4000
7 Argentina 1962 3000
I hope this can help you
Upvotes: 3
Reputation: 149
This is not exactly what you wanted, but with df.stack()
you can get the following:
0 Country United States
1960 1000
1961 2000
1962 3000
1963 2300
1 Country Argentina
1960 1000
1961 2000
1962 3000
1963 4000
Upvotes: 0