Crazy Guy
Crazy Guy

Reputation: 31

Pandas read_csv usecols and names not working properly

I am reading a csv file in pandas without headers. My problem is that when i hard code values in usecols and names it works fine. But when i take input from cols and names list which are taken as input from json files, the column names and rows are mismatched. I'm really struck at this issue from a long time.

//This doesn't work.

temp = pd.read_csv(file_name, header=None ,
                   skiprows=[0], engine='python',
                   sep=' |,',
                   usecols=cols,
                   names=names)  
                          

// But this works

temp = pd.read_csv(file_name, header=None ,
                   skiprows=[0], engine='python',
                   sep=' |,',
                   usecols=[10,12,14],
                   names=['a', 'b', 'c'])

Consider this csv :

X-Y-Z,12.67,India,
A-X-o,11.43,China,
D-F-L,6.09,Kenya.   

Now running this code

cols = [2,3,1]
names = ['price','country','code']

temp = pd.read_csv('test.csv', header=None ,
                   skiprows=[0], engine='python',
                   sep=' |,',
                   usecols = cols,
                   names= names) 

produces...

   price  country   code
0  A-X-o    11.43  China
1  D-F-L     6.09  Kenya

which is wrong as country's are not decimals

Upvotes: 2

Views: 13771

Answers (2)

geekzeus
geekzeus

Reputation: 895

import pandas as pd
fields = ['a', 'b','c']
#always use `skipinitialspace` which remove the spaces in the header for reading specific columns
df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)

EDIT:

You are doing it in a wrong way usecols as [2,3,1] and [1,2,3] both are same it is not going to change the order,what matters is how you specify names

import pandas as pd   

df = {'letters': ['X-Y-Z','A-X-o','D-F-L'],
    'num': [12.67,11.43,6.09],
    'Country':['India','China','Kenya']}

df = pd.DataFrame(df)
df.to_csv('df.csv', header=False)

Let's consider some scenario

#this is what you have done
cols = [2,3,1]
names = ['price','country','code']
temp = pd.read_csv('df.csv', header=None ,skiprows=[0], engine='python', sep=',',usecols = cols, names= names)

OUTPUT

        price   country code
   1    A-X-o   11.43   China
   2    D-F-L   6.09    Kenya

Lets change the cols to [1,2,3] and there will be no change in output

cols = [1,2,3]
names = ['price','country','code']
temp = pd.read_csv('df.csv', header=None ,skiprows=[0], engine='python', sep=',',usecols = cols, names= names)

OUTPUT

        price   country code
   1    A-X-o   11.43   China
   2    D-F-L   6.09    Kenya

Now lets change the cols to [2,3,1] or [1,2,3] and names to ['code','price','country']

cols = [2,3,1]
names = ['code','price','country']
temp = pd.read_csv('df.csv', header=None ,skiprows=[0], engine='python', sep=',', usecols = cols, names= names)

OUTPUT

    code    price   country
0   A-X-o   11.43   China
1   D-F-L   6.09    Kenya   

So basically just switching the cols to [2,3,1] and assigning its names accordingly as ['price','country','code'] will not give you different results, because usecols[2,3,1] == usecols[1,2,3]

Upvotes: 4

Zaynul Abadin Tuhin
Zaynul Abadin Tuhin

Reputation: 32021

use simply file name with the location it will work docs

temp = pd.read_csv(file_name,usecols= ['a', 'b', 'c'])

on therhand you could pull all the columns csv file and drop the un used column like below

temp=pd.read_csv(file_name)

temp.drop(['col1','col2'...'coln'], axis=1, inplace=True)

Upvotes: 1

Related Questions