Dave
Dave

Reputation: 19

Pandas won't skip empty line with index_col function

I experience an issue when I am using pandas in python.

I need to index my dataframe using country column. But there is an empty line after the column row which the csv file looks like this:

0 Televison, Physicians, and Life Expectancy
1 NaN, NaN, NaN, NaN, NaN, NaN
2 country, life expectancy, people/TV, people/physician, female life expectancy, male life expectancy
3 NaN, NaN, NaN, NaN, NaN, NaN (I need to skip this line)
4 value, value, value, value, value, value, 
5 value, value, value, value, value, value, 
...
...

I tried to skip the empty line between header and the first actual data line like this:

tvdf = pd.read_csv(infile, sep=',', header=2, skiprows=[3], nrows=40, index_col='Country', skip_blank_lines=True)

as a return, it successfully put country column as index. however, neither skiprows nor skip_blank_lines works within index_col function. My interpretation is: If I use country column as index, it recognizes the empty line (NaN) as the first index name. And neither skiprows nor skip_blank_lines will take effect in index_col function. I tried it without index_col, it will automatically skip non value lines without any skiprows or skip_blank_lines statements.

I have been searching online with this issue, and did not found any related issues. So in this stage, maybe I can either manipulate the cvs file and delete the empty line manually or does anyone have any experience dealing with that??

I appreciate your help!

Upvotes: 0

Views: 1700

Answers (3)

curious
curious

Reputation: 17

Create a new dataframe - cric_scores

test_scores = pd.DataFrame({'id' : [1, 2, '', 4, 5], 
'first_name' : ['Sachin', 'Dravid', '', 'Virat', 'Yuvraj'],
'scores' : [150, 210, '', 125, 75],
'state' : ['Mumbai', 'Karnataka','', 'Delhi', 'Punjab']})


skip = pd.read_csv(filepath_or_buffer = 'test_scores.csv', sep = ',', header = 
0)

skip

OUTPUT:- id first_name scores state 0 1.0 Sachin 150.0 Mumbai 1 2.0 Dravid 210.0 Karnataka 2 NaN NaN NaN NaN 3 4.0 Virat 125.0 Delhi 4 5.0 Yuvraj 75.0 Punjab

ISSUE: The skip_blank_lines is not working.

Upvotes: 0

smci
smci

Reputation: 33950

skip_blank_lines=True fixes this.

(no need to manually pass line-numbers of blank lines)

Upvotes: 0

piRSquared
piRSquared

Reputation: 294468

use skiprows=[0, 1, 3]

pd.read_clipboard(
    sep=',', skipinitialspace=True, skiprows=[0, 1, 3]
)

enter image description here

Upvotes: 1

Related Questions