Reputation: 1925
When trying to read some columns using their indices from a tabular file with pandas read_csv
it seems the usecols
and names
get out of sync with each other.
For example, having the file test.csv
:
FOO A -46450.494736 0.0728830817231
FOO A -46339.7126846 0.0695018062805
FOO A -46322.4942905 0.0866205763556
FOO B -46473.3117983 0.0481618121947
FOO B -46537.6827055 0.0436893868921
FOO B -46467.2102205 0.0485001911304
BAR C -33424.1224914 6.7981041851
BAR C -33461.4101485 7.40607068177
BAR C -33404.6396495 4.72117502707
and trying to read 3 columns by index without preserving the original order:
cols = [1, 2, 0]
names = ['X', 'Y', 'Z']
df = pd.read_csv(
'test.csv', sep='\t',
header=None,
index_col=None,
usecols=cols, names=names)
I'm getting the following dataframe:
X Y Z
0 FOO A -46450.494736
1 FOO A -46339.712685
2 FOO A -46322.494290
3 FOO B -46473.311798
4 FOO B -46537.682706
5 FOO B -46467.210220
6 BAR C -33424.122491
7 BAR C -33461.410148
8 BAR C -33404.639650
whereas I would expect column Z
to have the FOO
and BAR
, like this:
Z X Y
0 FOO A -46450.494736
1 FOO A -46339.712685
2 FOO A -46322.494290
3 FOO B -46473.311798
4 FOO B -46537.682706
5 FOO B -46467.210220
6 BAR C -33424.122491
7 BAR C -33461.410148
8 BAR C -33404.639650
I know pandas stores the dataframes as dictionary so the order of the columns may be different from the requested with usecols, but the problem here is that using usecols with indices and names doesn't make sense.
I really need to read the columns by their indices and then assign names to them. Is there any workaround for this?
Upvotes: 0
Views: 3598
Reputation: 52276
The documentation could be clearer on this (feel free to make an issue, or even better submit a pull request!) but usecols
is set-like - it does not define an order of columns, it simply is tested against for membership.
from io import StringIO
pd.read_csv(StringIO("""a,b,c
1,2,3
4,5,6"""), usecols=[0, 1, 2])
Out[31]:
a b c
0 1 2 3
1 4 5 6
pd.read_csv(StringIO("""a,b,c
1,2,3
4,5,6"""), usecols=[2, 1, 0])
Out[32]:
a b c
0 1 2 3
1 4 5 6
names
on the other hand is ordered. So in this case, the answer is to specify the names in the order you want them.
Upvotes: 2