Pivot a dataframe in python

Question

I have the below csv file

And after reading it with python in a DataFrame,I want to reshape it as
country,year1,value
country,year2,value

Taking into account only the country the year and the value.. So I will end up with 3 variables.

You can find the data set here:
http://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG

jezrael · Accepted Answer

You can use:

read_csv with omit first 4 rows by skiprows
check if last column has all NaN values by isnull with all
all values are NaNs, so you can omit it, one possible solution is use iloc for select first 61 columns
reshape by set_index with unstack
last reset_index with rename column by dict

df = pd.read_csv('API_NY.GDP.MKTP.KD.ZG_DS2_en_csv_v2.csv', skiprows=4)
print (df.head())
  Country Name Country Code         Indicator Name     Indicator Code  1960  \
0        Aruba          ABW  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG   NaN   
1  Afghanistan          AFG  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG   NaN   
2       Angola          AGO  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG   NaN   
3      Albania          ALB  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG   NaN   
4      Andorra          AND  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG   NaN   

   1961  1962  1963  1964  1965     ...            2008       2009      2010  \
0   NaN   NaN   NaN   NaN   NaN     ...       -6.881302  -5.653502       NaN   
1   NaN   NaN   NaN   NaN   NaN     ...        3.611368  21.020649  8.433290   
2   NaN   NaN   NaN   NaN   NaN     ...       13.817146   2.412870  3.407655   
3   NaN   NaN   NaN   NaN   NaN     ...        7.530000   3.350000  3.710000   
4   NaN   NaN   NaN   NaN   NaN     ...       -8.594256  -3.817986 -5.347977   

       2011       2012      2013      2014      2015      2016  Unnamed: 61  
0       NaN        NaN       NaN       NaN       NaN       NaN          NaN  
1  6.113685  14.434741  1.959123  1.312531  1.112558  2.232272          NaN  
2  3.918597   5.155441  6.813586  4.804473  3.006981  0.000000          NaN  
3  2.550000   1.420000  1.110000  1.800000  2.590000  3.460000          NaN  
4 -4.802675  -1.760010 -0.063514       NaN       NaN       NaN          NaN

cols = ['Country Name','Country Code','Indicator Name','Indicator Code']

print (df.iloc[:, 61].isnull().all())
True    

d = {'level_4':'year'}
df = df.iloc[:, :60]
       .set_index(cols)
       .stack()
       .reset_index(name='vals')
       .rename(columns=d)
print (df.head())
  Country Name Country Code         Indicator Name     Indicator Code  year  \
0        Aruba          ABW  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG  1995   
1        Aruba          ABW  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG  1996   
2        Aruba          ABW  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG  1997   
3        Aruba          ABW  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG  1998   
4        Aruba          ABW  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG  1999   

       vals  
0  1.245086  
1  7.814432  
2  6.666622  
3  1.154469  
4  4.514062

For 3 columns is solution very similar, for remove columns use drop:

df = pd.read_csv('API_NY.GDP.MKTP.KD.ZG_DS2_en_csv_v2.csv', skiprows=4)

d = {'level_1':'year'}
df= df.drop(['Country Code','Indicator Name','Indicator Code', 'Unnamed: 61'], axis=1)
df = df.set_index('Country Name').stack().reset_index(name='vals').rename(columns=d)
print (df.head())
  Country Name  year      vals
0        Aruba  1995  1.245086
1        Aruba  1996  7.814432
2        Aruba  1997  6.666622
3        Aruba  1998  1.154469
4        Aruba  1999  4.514062

Pivot a dataframe in python

Answers (2)

Related Questions