Reshape Pandas Dataframe with duplicate Index

Question

Current Dataframe:

CountryName      IndicatorCode    Year         Value  
Arab World     TX.VAL.MRCH.RS.ZS  1960  1.646954e+01  
Arab World     TX.VAL.MRCH.R1.ZS  1960  2.260207e+00
Arab World     TX.VAL.MRCH.RS.ZS  1961  1.244584e+01
Arab World     TX.VAL.MRCH.R1.ZS  1961  1.860104e+00  
Zimbabwe       DT.DIS.OFFT.CD     2015  8.377700e+07
Zimbabwe       DT.INT.OFFT.CD     2015  2.321300e+07
Zimbabwe       DT.AMT.PROP.CD     2015  6.250000e+05

I want to convert each value of IndicatorCode column as different columns and these columns should contain data from the respective rows of Value column.
For example, after doing reshape:

CountryName Year TX.VAL.MRCH.RS.ZS TX.VAL.MRCH.R1.ZS  
Arab World  1960 1.646954e+01      2.260207e+00
Arab World  1961 1.244584e+01      1.860104e+00

Final Dataframe columns should be:

[CountryName, Year, TX.VAL.MRCH.RS.ZS, TX.VAL.MRCH.R1.ZS, DT.DIS.OFFT.CD,DT.INT.OFFT.CD, DT.AMT.PROP.CD]

I tried using pivot, but not success. I cannot take Country name as Index also since its not unique.

temp = indicators_df.pivot(columns='IndicatorCode',  values='Value')

Got ValueError: negative dimensions are not allowed

akuiper · Accepted Answer

You can use pivot_table which accepts multiple columns as index, values and columns:

df.pivot_table("Value", ["CountryName", "Year"], "IndicatorCode").reset_index()

Some explanation:

The parameters passed here are by positions, i.e, they are in the order of values, index, and columns or:

df.pivot_table(values = "Value", index = ["CountryName", "Year"], columns = "IndicatorCode").reset_index()

The values are what fill the cells of the final data frame, the index are the columns that get deduplicated and remain as columns in the result, the columns variables are ones that get pivoted to column headers in the result.

Reshape Pandas Dataframe with duplicate Index

Answers (2)

Related Questions