Shan
Shan

Reputation: 19243

How to specify exact number of columns in pandas

I have a text files to read that have no headers. I specified the parameter

  header=None

This is fine.

I am using the following statement to read the files

  pd.read_csv(fname, '\t', header=None, quotechar=None, quoting=3)

So, I am using the tab separator.

Following is the sample file

   a    b   c
   a    b   c
   a    b   c

The file above is read fine. But some of the files look as follows

   a      
   a    b   c
   a    b   c
   a    b   c

And for this file the error is as follows

    pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 3

The problem cant be solves by skipping the first line, as we are unsure that second like is in correct format or not. The major thing is how to specify the desired nuber of columns.

I know that we could fill not available values.

But , how could I specify the number of columns in csv reader such that it doesnt get confuse with the error in the first row, if we skip the first row only, perhaps, the second row is also problematic.

Thanks

Cheers

Upvotes: 1

Views: 85

Answers (2)

mee
mee

Reputation: 718

you can try specifying that your separator is a tab and every missing value will be filled with NaN:

d=pd.read_csv('test.csv',sep='\t', header=None)

and get:

   0    1    2
0  a  NaN  NaN
1  a    b    c
2  a    b    c
3  a    b    c

Upvotes: 0

vercelli
vercelli

Reputation: 4757

names parameter made the trick:

df = pd.read_csv(fname, '\t', header=None, names=['A', 'B', 'C'])

Returns:

    A   B   C
0   a   NaN     NaN
1   a   b   c
2   a   b   c
3   a   b   c

Upvotes: 1

Related Questions