Reputation: 815
This is my code:
import os
import pandas as pd
path = os.getcwd()
files = os.listdir(path)
df = pd.DataFrame()
for f in files:
data = pd.read_csv(f, usecols = ['A','B','C','D','E'])
df = df.append(data)
I get the traceback:
File "C:\Users\dbhadra\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1134, in _validate_usecols_names
"columns expected but not found: {missing}".format(missing=missing)
ValueError: Usecols do not match columns, columns expected but not found: ['B', 'A', 'C', 'E', 'D']
I've used a similar command for read_excel
before and it works. Do I need to include the sep/engine
arguments? I don't see why. Also, can I use usecol = 'A:E'
as in read_excel? The csv files are quite vanilla. No tricks.
Upvotes: 0
Views: 3983
Reputation: 14273
Compare the docs for pandas.read_csv and pandas.read_excel
for read_csv()
:
usecols : list-like or callable, optional
Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']
and for read_excel()
usecols : int, str, list-like, or callable default None
Return a subset of the columns. * If None, then parse all columns. * If int, then indicates last column to be parsed.
Deprecated since version 0.24.0: Pass in a list of int instead from 0 to usecols inclusive.
If str, then indicates comma separated list of Excel column letters and column ranges (e.g. “A:E” or “A,C,E:F”). Ranges are inclusive of both sides.
If list of int, then indicates list of column numbers to be parsed.
If list of string, then indicates list of column names to be parsed.
so in your case pass either column names from header or list of integers (column indexes)
Upvotes: 1
Reputation: 10030
In Excel you have the main column naming (A-Z, AA-ZZ...). In CSV you haven't it. usecols
is the list of columns according to their headers (in the first line). If you have no headers in your CSV file, just remove usecols
, you can delete these columns later. Or you can specify columns as integers (like usecols=(1, 3, 7)
instead of usecols=(A, C, F)
. If you have headers, write their names in usecols
.
usecols : list-like or callable, optional
Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). For example, a valid list-like usecols parameter would be
[0, 1, 2]
or['foo', 'bar', 'baz']
. Element order is ignored, sousecols=[0, 1]
is the same as[1, 0]
. To instantiate a DataFrame from data with element order preserved usepd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]
for columns in['foo', 'bar']
order orpd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]
for['bar', 'foo']
order.If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be
lambda x: x.upper()
in['AAA', 'BBB', 'DDD']
. Using this parameter results in much faster parsing time and lower memory usage.
Upvotes: 2