Reputation: 3161
I am having a csv file with columns sentence, length, category and 18 more columns. I am trying to filter out specific columns.
Assume I have x,y,a,b,c,d,e,f,g,h as last 10 columns. I am trying to filter out length, category and the last eight columns.
when I do it for the last 8 columns alone as,
col_req = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
data = pd.read_csv('data.csv', names=col_req)
it is working perfectly. but when I try,
col_req = ['length','category','a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
data = pd.read_csv('data.csv', names=col_req)
the output is,
('g', 'h', 'x', 'y', 'a', 'b', 'c', 'd', 'e', 'f')
I don't know where I am I going wrong.
Upvotes: 1
Views: 594
Reputation: 164843
I am trying to filter out length, category and the last eight columns.
If you want to filter by a combination of label-based and integer positional indices, you can read your column labels first, calculate your required labels, and then use the result when you read your data:
# use nrows=0 to only read in column labels
cols_all = pd.read_csv('data'.csv, nrows=0).columns
cols_req = ['length', 'category'] + cols_all[-8:].tolist()
# use use_cols parameter to filter by specified labels
df = pd.read_csv('data.csv', use_cols=cols_req)
This assumes, of course, your labels are unique.
Upvotes: 0
Reputation: 316
Check this answer. Might be col_names aren't correct
df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)
Upvotes: 0
Reputation: 8561
You need to use the argument use_cols
to do that
col_req = ['length','category','a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
data = pd.read_csv('data.csv', use_cols=col_req)
Upvotes: 2