timekeeper
timekeeper

Reputation: 718

Reshape Pandas Dataframe with duplicate Index

Current Dataframe:

CountryName      IndicatorCode    Year         Value  
Arab World     TX.VAL.MRCH.RS.ZS  1960  1.646954e+01  
Arab World     TX.VAL.MRCH.R1.ZS  1960  2.260207e+00
Arab World     TX.VAL.MRCH.RS.ZS  1961  1.244584e+01
Arab World     TX.VAL.MRCH.R1.ZS  1961  1.860104e+00  
Zimbabwe       DT.DIS.OFFT.CD     2015  8.377700e+07
Zimbabwe       DT.INT.OFFT.CD     2015  2.321300e+07
Zimbabwe       DT.AMT.PROP.CD     2015  6.250000e+05

I want to convert each value of IndicatorCode column as different columns and these columns should contain data from the respective rows of Value column.
For example, after doing reshape:

CountryName Year TX.VAL.MRCH.RS.ZS TX.VAL.MRCH.R1.ZS  
Arab World  1960 1.646954e+01      2.260207e+00
Arab World  1961 1.244584e+01      1.860104e+00

Final Dataframe columns should be:

[CountryName, Year, TX.VAL.MRCH.RS.ZS, TX.VAL.MRCH.R1.ZS, DT.DIS.OFFT.CD,DT.INT.OFFT.CD, DT.AMT.PROP.CD]  

I tried using pivot, but not success. I cannot take Country name as Index also since its not unique.

temp = indicators_df.pivot(columns='IndicatorCode',  values='Value')

Got ValueError: negative dimensions are not allowed

Upvotes: 1

Views: 1244

Answers (2)

akuiper
akuiper

Reputation: 215127

You can use pivot_table which accepts multiple columns as index, values and columns:

df.pivot_table("Value", ["CountryName", "Year"], "IndicatorCode").reset_index()

enter image description here

Some explanation:

The parameters passed here are by positions, i.e, they are in the order of values, index, and columns or:

df.pivot_table(values = "Value", index = ["CountryName", "Year"], columns = "IndicatorCode").reset_index()

The values are what fill the cells of the final data frame, the index are the columns that get deduplicated and remain as columns in the result, the columns variables are ones that get pivoted to column headers in the result.

Upvotes: 4

piRSquared
piRSquared

Reputation: 294526

set_index + unstack

s = df.set_index(['CountryName', 'Year', 'IndicatorCode']).Value
s.unstack().reset_index().rename_axis([None], 1)

enter image description here

Upvotes: 1

Related Questions