Reputation: 666
I have a large CSV file that is created through someone else's script with output columns named 'D', 'B', 'A', 'C'.
Is there a way to read it from CSV directly as 'A', 'B', 'C', 'D'.
Upvotes: 1
Views: 2144
Reputation: 11
Add .reset_index() in order to reset the in the index_col set index.
import pandas as pd
csv = pd.read_csv(data, index_col=['A', 'B', 'C', 'D'], usecols=['A', 'B', 'C', 'D']).reset_index()
Upvotes: 1
Reputation: 51
I found this solution that works with pandas 0.25.1
CSV file with output columns named 'D', 'B', 'A', 'C' Read it from CSV directly as 'A', 'B', 'C', 'D'
import pandas as pd
csv = pd.read_csv(data, index_col=['A', 'B', 'C', 'D'], usecols=['A', 'B', 'C', 'D'])
It reads the CSV columns in the given order into a pandas DataFrame named 'csv' Both the index_col=[] and the usecols=[] have to containe the same list order.
Upvotes: 2
Reputation: 11657
I assume you're not asking how to change column order in Pandas, but would like to change it before reading in, possibly to reduce memory use. (If not see @jezrael's answer.)
Best is to do it with csv
before reading into Pandas. Example:
import csv
with open('file.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
fieldnames = ['A', 'B', 'C', 'D']
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
for row in csv.DictReader(infile):
writer.writerow(row)
Taken from here
In terms of time, quickest is doing it as CSV (I tested with a very small csv file):
%%time
df = pd.read_csv('r.csv')[['A', 'B', 'C', 'D']]
CPU times: user 3.53 ms, sys: 1.48 ms, total: 5 ms
Wall time: 3.79 ms
%%time
df=pd.read_csv('r.csv')[df.columns[::-1]]
CPU times: user 4.75 ms, sys: 2.01 ms, total: 6.76 ms
Wall time: 5.13 ms
%%time
cols=list('ABCD')
pd.concat([pd.read_csv('r.csv', usecols=[x]) for x in cols], axis=1)
CPU times: user 8.98 ms, sys: 2.25 ms, total: 11.2 ms
Wall time: 9.88 ms
%%time
df = pd.read_csv('r.csv').sort_index(axis=1)
CPU times: user 4.21 ms, sys: 1.88 ms, total: 6.09 ms
Wall time: 4.65 ms
%%time
with open('r.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
fieldnames = ['A', 'B', 'C', 'D']
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
for row in csv.DictReader(infile):
writer.writerow(row)
df = pd.read_csv('reordered.csv')
CPU times: user 3.29 ms, sys: 1.91 ms, total: 5.2 ms
Wall time: 3.83 ms
Upvotes: 1
Reputation: 394159
One method would be to read a single column at a time and store in a list comprehension and concat
the result:
In[121]:
t="""D,B,A,C
4,2,1,3"""
cols = list('ABCD')
pd.concat([pd.read_csv(io.StringIO(t), usecols=[x]) for x in cols], axis=1)
Out[121]:
A B C D
0 1 2 3 4
However, I think that it's easier to just sort it by sub-selecting the columns in the order you want after reading normally:
df = df[['A','B','C','D']]
Upvotes: 1
Reputation: 76
Try pandas:
import pandas as pd
df=pd.read_csv(filename)
df=df[df.columns[::-1]]
Upvotes: 0
Reputation: 863166
I think read_csv
cannot do it, idea with usecols
also not working:
import pandas as pd
temp=u"""D;B;A;C
0;a;4;7;1
1;b;5;8;3
2;c;4;9;5
3;d;5;4;7
4;e;5;2;1
5;f;4;3;0"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", usecols=['A', 'B', 'C', 'D'])
print (df)
D B A C
0 0 a 4 7
1 1 b 5 8
2 2 c 4 9
3 3 d 5 4
4 4 e 5 2
5 5 f 4 3
Solution is change order by list or sorting index:
df = df[['A', 'B', 'C', 'D']]
print (df)
A B C D
0 4 a 7 0
1 5 b 8 1
2 4 c 9 2
3 5 d 4 3
4 5 e 2 4
5 4 f 3 5
df = df.sort_index(axis=1)
print (df)
A B C D
0 4 a 7 0
1 5 b 8 1
2 4 c 9 2
3 5 d 4 3
4 5 e 2 4
5 4 f 3 5
Upvotes: 1