sara r.
sara r.

Reputation: 43

reshape dataframe into long format python

I am working with a CSV data set called newdata.csv that has the Corruption Perceptions Index ratings for 186 countries from the years 1997 to 2015. Here is a small snapshot of it:

Small snapshot data

Here, the years stretch out to the right. I want to put the data into long format that I can work with in python. to do this, I tried to use the melt function of pandas.This code:

import pandas as pd
data = pd.read_csv("newdata.csv", encoding = "ISO-8859-1")
a = pd.melt(data, id_vars='Country', value_vars=[1997, 1998, 1999, 
2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 
2012, 2013, 2014, 2015])

Then I export it as a new CSV that i can look at in excel, with this code:

a.to_csv('Pythonoutput.csv', sep=',')

When I do this, I get the country column and the year column as I want them, but not any of the data. Here is my result:

enter image description here How do I get my CPI values to display downward as well as the years?

Upvotes: 2

Views: 211

Answers (1)

ascripter
ascripter

Reputation: 6213

First, read your data

df = pd.read_csv("newdata.csv", encoding = "ISO-8859-1")

The index of your data is the Country, so let the dataframe know it via set_index()

df = df.set_index('Country')

To stack the colums all together, use stack()

df = df.stack()

By stacking, the DataFrame automatically got a MultiIndex with 2 levels. Finally rename the index. The DataFrame could infer Country as name for the first level, but the second level doesn't have a label yet:

df.index.names = ['Country', 'Year']

And now you're ready to export it via to_csv(). Give the series a name with the header-argument

df.to_csv("Pythonoutput.csv", header=['CPI'])

Putting it all together:

df = pd.read_csv("newdata.csv", encoding = "ISO-8859-1").set_index('Country').stack()
df.index.names = ['Country', 'Year']
df.to_csv("Pythonoutput.csv", header=['CPI'])

Upvotes: 1

Related Questions