Create dataframe from specific column

Question

I am trying to create a dataframe in Pandas from the AB column in my csv file. (AB is the 27th column).

I am using this line:

df = pd.read_csv(filename, error_bad_lines = False, usecols = [27])

... which is resulting in this error:

ValueError: Usecols do not match names.

I'm very new to Pandas, could someone point out what i'm doing wrong to me?

MaxU - stand with Ukraine · Accepted Answer

Here is a small demo:

CSV file (without header, i.e. there is NO column names):

1,2,3,4,5,6,7,8,9,10
11,12,13,14,15,16,17,18,19,20

We are going to read only 8-th column:

In [1]: fn = r'D:	emp\.data\1.csv'

In [2]: df = pd.read_csv(fn, header=None, usecols=[7], names=['col8'])

In [3]: df
Out[3]:
   col8
0     8
1    18

PS pay attention at header=None, usecols=[7], names=['col8']

If you don't use header=None and names parameters, the first row will be used as a header:

In [6]: df = pd.read_csv(fn, usecols=[7])

In [7]: df
Out[7]:
    8
0  18

In [8]: df.columns
Out[8]: Index(['8'], dtype='object')

and if we want to read only the last 10-th column:

In [9]: df = pd.read_csv(fn, usecols=[10])
... skipped ...
ValueError: Usecols do not match names.

because pandas counts columns starting from 0, so we have to do it this way:

In [12]: df = pd.read_csv(fn, usecols=[9], names=['col10'])

In [13]: df
Out[13]:
   col10
0     10
1     20

Create dataframe from specific column

Answers (2)

Related Questions