agent18
agent18

Reputation: 2297

Preserving column order in the pandas to_csv method

The to_csv method of pandas does not preserve the order of columns. It chooses to alphabetically arrange the columns in CSV. This is a bug and has been reported and is supposed to be corrected in version 0.11.0. I have 0.18.0.

import pandas as pd
df = pd.DataFrame({'V_pod_error' : [a],
                   'V_pod_used' : [b],
                   'U_sol_type' : [c]
                                ...
                                ... and so on upto 50 columns }

pd.to_csv(df)

Excel order:

0   U_sol type          V_pod_error      V_pod_used      ...
1

What I want is order in the dictionary:

0   V_pod_error      V_pod_used          U_sol type     ...
1

I have a huge number of columns and names. I cannot do it manually or write out the column order. There has been the exact same question in 2013 here. And it doesnt look like there is an update! I would like to ask the community to help me out! This is really problematic.

Upvotes: 17

Views: 19130

Answers (3)

Saranya
Saranya

Reputation: 756

Try the following solution. Even I faced the same issue. I solved it as follows:

import pandas as pd
df = pd.DataFrame({'V_pod_error' : [a],
                   'V_pod_used' : [b],
                   'U_sol_type' : [c]
                                ...
                                ... and so on upto 50 columns }

column_order = ['V_pod_error', 'V_pod_used', 'U_sol_type',.....# upto 50 column names]

df[column_order].to_csv(file_name)

Upvotes: 23

Biranchi
Biranchi

Reputation: 16327

Try with:

df.to_csv(file_name, sep=',', encoding='utf-8', header=True, columns=["Col1","Col2","Col3","Col4"])

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

Upvotes: 4

jezrael
jezrael

Reputation: 863166

I think problem is in DataFrame constructor, because you need add parameter columns for custom ordering of columns. If you dont set parameter columns, columns are ordered alphanumerical.

import pandas as pd
df = pd.DataFrame({'V_pod_error' : [0,2],
                   'V_pod_used' : [6,4],
                   'U_sol_type' : [7,8]})
print df
   U_sol_type  V_pod_error  V_pod_used
0           7            0           6
1           8            2           4

print df.to_csv()
,U_sol_type,V_pod_error,V_pod_used
0,7,0,6
1,8,2,4


df1 = pd.DataFrame({'V_pod_error' : [0,2],
                   'V_pod_used' : [6,4],
                   'U_sol_type' : [7,8]}, 
                    columns=['V_pod_error','V_pod_used','U_sol_type'])

print df1
   V_pod_error  V_pod_used  U_sol_type
0            0           6           7
1            2           4           8

print df1.to_csv()
,V_pod_error,V_pod_used,U_sol_type
0,0,6,7
1,2,4,8

EDIT:

Another solution is set order of column by subset before write to_csv (thanks Mathias711):

import pandas as pd
df = pd.DataFrame({'V_pod_error' : [0,2],
                   'V_pod_used' : [6,4],
                   'U_sol_type' : [7,8]})
print df
   U_sol_type  V_pod_error  V_pod_used
0           7            0           6
1           8            2           4

df = df[['V_pod_error','V_pod_used','U_sol_type']]
print df

   V_pod_error  V_pod_used  U_sol_type
0            0           6           7
1            2           4           8

EDIT1: Maybe help first convert dict to OrderedDict and then create DataFrame:

import collections
import pandas as pd


d = {'V_pod_error' : [0,2],'V_pod_used' : [6,4], 'U_sol_type' : [7,8]}
print d
{'V_pod_error': [0, 2], 'V_pod_used': [6, 4], 'U_sol_type': [7, 8]}

print pd.DataFrame(d)
   U_sol_type  V_pod_error  V_pod_used
0           7            0           6
1           8            2           4

d1 = collections.OrderedDict(d)
print d1
OrderedDict([('V_pod_error', [0, 2]), ('V_pod_used', [6, 4]), ('U_sol_type', [7, 8])])

print pd.DataFrame(d1)
   V_pod_error  V_pod_used  U_sol_type
0            0           6           7
1            2           4           8

Upvotes: 9

Related Questions