Reputation: 718
Current Dataframe:
CountryName IndicatorCode Year Value
Arab World TX.VAL.MRCH.RS.ZS 1960 1.646954e+01
Arab World TX.VAL.MRCH.R1.ZS 1960 2.260207e+00
Arab World TX.VAL.MRCH.RS.ZS 1961 1.244584e+01
Arab World TX.VAL.MRCH.R1.ZS 1961 1.860104e+00
Zimbabwe DT.DIS.OFFT.CD 2015 8.377700e+07
Zimbabwe DT.INT.OFFT.CD 2015 2.321300e+07
Zimbabwe DT.AMT.PROP.CD 2015 6.250000e+05
I want to convert each value of IndicatorCode column as different columns and these columns should contain data from the respective rows of Value column.
For example, after doing reshape:
CountryName Year TX.VAL.MRCH.RS.ZS TX.VAL.MRCH.R1.ZS
Arab World 1960 1.646954e+01 2.260207e+00
Arab World 1961 1.244584e+01 1.860104e+00
Final Dataframe columns should be:
[CountryName, Year, TX.VAL.MRCH.RS.ZS, TX.VAL.MRCH.R1.ZS, DT.DIS.OFFT.CD,DT.INT.OFFT.CD, DT.AMT.PROP.CD]
I tried using pivot, but not success. I cannot take Country name as Index also since its not unique.
temp = indicators_df.pivot(columns='IndicatorCode', values='Value')
Got ValueError: negative dimensions are not allowed
Upvotes: 1
Views: 1244
Reputation: 215127
You can use pivot_table
which accepts multiple columns as index, values and columns:
df.pivot_table("Value", ["CountryName", "Year"], "IndicatorCode").reset_index()
Some explanation:
The parameters passed here are by positions, i.e, they are in the order of values, index, and columns
or:
df.pivot_table(values = "Value", index = ["CountryName", "Year"], columns = "IndicatorCode").reset_index()
The values are what fill the cells of the final data frame, the index are the columns that get deduplicated and remain as columns in the result, the columns variables are ones that get pivoted to column headers in the result.
Upvotes: 4
Reputation: 294526
set_index
+ unstack
s = df.set_index(['CountryName', 'Year', 'IndicatorCode']).Value
s.unstack().reset_index().rename_axis([None], 1)
Upvotes: 1