Arjun Sankarlal
Arjun Sankarlal

Reputation: 3161

Pandas reads wrong column

I am having a csv file with columns sentence, length, category and 18 more columns. I am trying to filter out specific columns.

Assume I have x,y,a,b,c,d,e,f,g,h as last 10 columns. I am trying to filter out length, category and the last eight columns.

when I do it for the last 8 columns alone as,

col_req = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
data = pd.read_csv('data.csv', names=col_req)

it is working perfectly. but when I try,

col_req = ['length','category','a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
data = pd.read_csv('data.csv', names=col_req) 

the output is,

('g', 'h', 'x', 'y', 'a', 'b', 'c', 'd', 'e', 'f')

I don't know where I am I going wrong.

Upvotes: 1

Views: 594

Answers (3)

jpp
jpp

Reputation: 164843

I am trying to filter out length, category and the last eight columns.

If you want to filter by a combination of label-based and integer positional indices, you can read your column labels first, calculate your required labels, and then use the result when you read your data:

# use nrows=0 to only read in column labels
cols_all = pd.read_csv('data'.csv, nrows=0).columns
cols_req = ['length', 'category'] + cols_all[-8:].tolist()

# use use_cols parameter to filter by specified labels
df = pd.read_csv('data.csv', use_cols=cols_req)

This assumes, of course, your labels are unique.

Upvotes: 0

Venkatesh Garnepudi
Venkatesh Garnepudi

Reputation: 316

Check this answer. Might be col_names aren't correct

df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)

Upvotes: 0

Jeril
Jeril

Reputation: 8561

You need to use the argument use_cols to do that

 col_req = ['length','category','a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
 data = pd.read_csv('data.csv', use_cols=col_req) 

Upvotes: 2

Related Questions