Zusman
Zusman

Reputation: 666

read a csv to df with a different columns order

I have a large CSV file that is created through someone else's script with output columns named 'D', 'B', 'A', 'C'.
Is there a way to read it from CSV directly as 'A', 'B', 'C', 'D'.

Upvotes: 1

Views: 2144

Answers (6)

Joe
Joe

Reputation: 11

Add .reset_index() in order to reset the in the index_col set index.

import pandas as pd
csv = pd.read_csv(data, index_col=['A', 'B', 'C', 'D'], usecols=['A', 'B', 'C', 'D']).reset_index() 

Upvotes: 1

Joe
Joe

Reputation: 51

I found this solution that works with pandas 0.25.1

CSV file with output columns named 'D', 'B', 'A', 'C' Read it from CSV directly as 'A', 'B', 'C', 'D'

import pandas as pd
csv = pd.read_csv(data, index_col=['A', 'B', 'C', 'D'], usecols=['A', 'B', 'C', 'D'])

It reads the CSV columns in the given order into a pandas DataFrame named 'csv' Both the index_col=[] and the usecols=[] have to containe the same list order.

Upvotes: 2

Josh Friedlander
Josh Friedlander

Reputation: 11657

I assume you're not asking how to change column order in Pandas, but would like to change it before reading in, possibly to reduce memory use. (If not see @jezrael's answer.)

Best is to do it with csv before reading into Pandas. Example:

import csv

with open('file.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
    fieldnames = ['A', 'B', 'C', 'D']
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    writer.writeheader()
    for row in csv.DictReader(infile):
        writer.writerow(row)

Taken from here

In terms of time, quickest is doing it as CSV (I tested with a very small csv file):

%%time
df = pd.read_csv('r.csv')[['A', 'B', 'C', 'D']]

CPU times: user 3.53 ms, sys: 1.48 ms, total: 5 ms
Wall time: 3.79 ms

%%time
df=pd.read_csv('r.csv')[df.columns[::-1]]

CPU times: user 4.75 ms, sys: 2.01 ms, total: 6.76 ms
Wall time: 5.13 ms

%%time
cols=list('ABCD')
pd.concat([pd.read_csv('r.csv', usecols=[x]) for x in cols], axis=1)

CPU times: user 8.98 ms, sys: 2.25 ms, total: 11.2 ms
Wall time: 9.88 ms

%%time
df = pd.read_csv('r.csv').sort_index(axis=1)

CPU times: user 4.21 ms, sys: 1.88 ms, total: 6.09 ms
Wall time: 4.65 ms

%%time
with open('r.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:

    fieldnames = ['A', 'B', 'C', 'D']
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    writer.writeheader()
    for row in csv.DictReader(infile):
        writer.writerow(row)
df = pd.read_csv('reordered.csv')

CPU times: user 3.29 ms, sys: 1.91 ms, total: 5.2 ms
Wall time: 3.83 ms

Upvotes: 1

EdChum
EdChum

Reputation: 394159

One method would be to read a single column at a time and store in a list comprehension and concat the result:

In[121]:
t="""D,B,A,C
4,2,1,3"""
cols = list('ABCD')
pd.concat([pd.read_csv(io.StringIO(t), usecols=[x]) for x in cols], axis=1)

Out[121]: 
   A  B  C  D
0  1  2  3  4

However, I think that it's easier to just sort it by sub-selecting the columns in the order you want after reading normally:

df = df[['A','B','C','D']]

Upvotes: 1

Vasyl Kushnir
Vasyl Kushnir

Reputation: 76

Try pandas:

import pandas as pd
df=pd.read_csv(filename)
df=df[df.columns[::-1]]

Upvotes: 0

jezrael
jezrael

Reputation: 863166

I think read_csv cannot do it, idea with usecols also not working:

import pandas as pd

temp=u"""D;B;A;C
0;a;4;7;1
1;b;5;8;3
2;c;4;9;5
3;d;5;4;7
4;e;5;2;1
5;f;4;3;0"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", usecols=['A', 'B', 'C', 'D'])
print (df)
   D  B  A  C
0  0  a  4  7
1  1  b  5  8
2  2  c  4  9
3  3  d  5  4
4  4  e  5  2
5  5  f  4  3

Solution is change order by list or sorting index:

df = df[['A', 'B', 'C', 'D']]
print (df)
   A  B  C  D
0  4  a  7  0
1  5  b  8  1
2  4  c  9  2
3  5  d  4  3
4  5  e  2  4
5  4  f  3  5

df = df.sort_index(axis=1)
print (df)
   A  B  C  D
0  4  a  7  0
1  5  b  8  1
2  4  c  9  2
3  5  d  4  3
4  5  e  2  4
5  4  f  3  5

Upvotes: 1

Related Questions