SimonBiggs
SimonBiggs

Reputation: 934

When saving a pandas dataframe to csv how do I retain the columns.name?

Initial Problem

When I run the following in ipython

import numpy as np
import pandas as pd

df = pd.DataFrame(np.round(9*np.random.rand(4,4), decimals=1))
df.index.name = 'x'
df.columns.name = 'y'

df.to_csv('output.csv')

df

it outputs the following result:

y    0    1    2    3
x                    
0  7.6  7.4  0.3  7.5
1  5.6  0.0  1.5  5.9
2  7.1  2.1  0.0  0.9
3  3.7  6.6  3.3  8.4

However when I open output.csv the "y" is removed:

x   0   1   2   3
0   7.6 7.4 0.3 7.5
1   5.6 0   1.5 5.9
2   7.1 2.1 0   0.9
3   3.7 6.6 3.3 8.4

How do I make it so that the df.columns.name is retained when I output the dataframe to csv?

Crude workaround

Current crude work-around is me doing the following:

df.to_csv('output.csv', index_label = 'x|y')

Which results in output.csv reading:

x|y 0   1   2   3
0   7.6 7.4 0.3 7.5
1   5.6 0   1.5 5.9
2   7.1 2.1 0   0.9
3   3.7 6.6 3.3 8.4

Something better would be great! Thanks for your help (in advance).

Context

This is what I am working on: https://github.com/SimonBiggs/Electron-Cutout-Factors

This is an example table: https://github.com/SimonBiggs/Electron-Cutout-Factors/blob/master/output/20140807_173714/06app06eng/interpolation-table.csv

Upvotes: 6

Views: 22905

Answers (3)

Jonathan Drucker
Jonathan Drucker

Reputation: 1

For some reason, it works fine if the column labels are a multiindex. This seems to be a pandas issue. A solution that works and isn't too ugly would be:

import numpy as np
import pandas as pd

df = pd.DataFrame(np.round(9*np.random.rand(4,4), decimals=1))
df.index.name = 'x'
df.columns.name = 'y'
##### Add this line to create another column index level
df.columns = [df.columns, df.columns]

df.to_csv('output.csv')

##### When you read it in, specify that the first two lines are both headers
df2 = pd.read_csv('output.csv', index_col=0, header=[0,1])
##### Drop the extra level
df2.columns = df2.columns.droplevel(0)

df2

Upvotes: 0

WaveRider
WaveRider

Reputation: 495

You can pass a list to name the columns, then you can specify the index name when you are writing to csv:

df.columns = ['column_name1', 'column_name2', 'column_name3']
df.to_csv('/path/to/file.csv', index_label='Index_name')

Upvotes: 10

John Zwinck
John Zwinck

Reputation: 249444

How about this? It's slightly different but hopefully usable, since it fits the CSV paradigm:

>>> df.columns = ['y{}'.format(name) for name in df.columns]
>>> df.to_csv('output.csv')
>>> print open('output.csv').read()
x,y0,y1,y2,y3
0,3.5,1.5,1.6,0.3
1,7.0,4.7,6.5,5.2
2,6.6,7.6,3.2,5.5
3,4.0,2.8,7.1,7.8

Upvotes: 1

Related Questions