Pandas read_csv

Question

This is my code:

import os 
import pandas as pd

path = os.getcwd()
files = os.listdir(path)
df = pd.DataFrame()

for f in files:
    data = pd.read_csv(f, usecols = ['A','B','C','D','E'])
    df = df.append(data)

I get the traceback:

  File "C:\Users\dbhadra\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1134, in _validate_usecols_names
    "columns expected but not found: {missing}".format(missing=missing)

ValueError: Usecols do not match columns, columns expected but not found: ['B', 'A', 'C', 'E', 'D']

I've used a similar command for read_excel before and it works. Do I need to include the sep/engine arguments? I don't see why. Also, can I use usecol = 'A:E' as in read_excel? The csv files are quite vanilla. No tricks.

vurmux · Accepted Answer

In Excel you have the main column naming (A-Z, AA-ZZ...). In CSV you haven't it. usecols is the list of columns according to their headers (in the first line). If you have no headers in your CSV file, just remove usecols, you can delete these columns later. Or you can specify columns as integers (like usecols=(1, 3, 7) instead of usecols=(A, C, F). If you have headers, write their names in usecols.

usecols : list-like or callable, optional

Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). For example, a valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Element order is ignored, so usecols=[0, 1] is the same as [1, 0]. To instantiate a DataFrame from data with element order preserved use pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']] for columns in ['foo', 'bar'] order or pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']] for ['bar', 'foo'] order.

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be lambda x: x.upper() in ['AAA', 'BBB', 'DDD']. Using this parameter results in much faster parsing time and lower memory usage.

Pandas read_csv

Answers (2)

Related Questions