pandas read_csv usecols and names out of sync

Question

When trying to read some columns using their indices from a tabular file with pandas read_csv it seems the usecols and names get out of sync with each other.

For example, having the file test.csv:

FOO A   -46450.494736   0.0728830817231
FOO A   -46339.7126846  0.0695018062805
FOO A   -46322.4942905  0.0866205763556
FOO B   -46473.3117983  0.0481618121947
FOO B   -46537.6827055  0.0436893868921
FOO B   -46467.2102205  0.0485001911304
BAR C   -33424.1224914  6.7981041851
BAR C   -33461.4101485  7.40607068177
BAR C   -33404.6396495  4.72117502707

and trying to read 3 columns by index without preserving the original order:

cols = [1, 2, 0]
names = ['X', 'Y', 'Z']

df = pd.read_csv(
                'test.csv', sep='	',
                header=None,
                index_col=None,
                usecols=cols, names=names)

I'm getting the following dataframe:

     X  Y             Z
0  FOO  A -46450.494736
1  FOO  A -46339.712685
2  FOO  A -46322.494290
3  FOO  B -46473.311798
4  FOO  B -46537.682706
5  FOO  B -46467.210220
6  BAR  C -33424.122491
7  BAR  C -33461.410148
8  BAR  C -33404.639650

whereas I would expect column Z to have the FOO and BAR, like this:

     Z  X             Y
0  FOO  A -46450.494736
1  FOO  A -46339.712685
2  FOO  A -46322.494290
3  FOO  B -46473.311798
4  FOO  B -46537.682706
5  FOO  B -46467.210220
6  BAR  C -33424.122491
7  BAR  C -33461.410148
8  BAR  C -33404.639650

I know pandas stores the dataframes as dictionary so the order of the columns may be different from the requested with usecols, but the problem here is that using usecols with indices and names doesn't make sense.

I really need to read the columns by their indices and then assign names to them. Is there any workaround for this?

pandas read_csv usecols and names out of sync

Answers (1)

Related Questions