Reputation: 19
I experience an issue when I am using pandas in python.
I need to index my dataframe using country column. But there is an empty line after the column row which the csv file looks like this:
0 Televison, Physicians, and Life Expectancy
1 NaN, NaN, NaN, NaN, NaN, NaN
2 country, life expectancy, people/TV, people/physician, female life expectancy, male life expectancy
3 NaN, NaN, NaN, NaN, NaN, NaN (I need to skip this line)
4 value, value, value, value, value, value,
5 value, value, value, value, value, value,
...
...
I tried to skip the empty line between header and the first actual data line like this:
tvdf = pd.read_csv(infile, sep=',', header=2, skiprows=[3], nrows=40, index_col='Country', skip_blank_lines=True)
as a return, it successfully put country column as index. however, neither skiprows nor skip_blank_lines works within index_col function. My interpretation is: If I use country column as index, it recognizes the empty line (NaN) as the first index name. And neither skiprows nor skip_blank_lines will take effect in index_col function. I tried it without index_col, it will automatically skip non value lines without any skiprows or skip_blank_lines statements.
I have been searching online with this issue, and did not found any related issues. So in this stage, maybe I can either manipulate the cvs file and delete the empty line manually or does anyone have any experience dealing with that??
I appreciate your help!
Upvotes: 0
Views: 1700
Reputation: 17
test_scores = pd.DataFrame({'id' : [1, 2, '', 4, 5],
'first_name' : ['Sachin', 'Dravid', '', 'Virat', 'Yuvraj'],
'scores' : [150, 210, '', 125, 75],
'state' : ['Mumbai', 'Karnataka','', 'Delhi', 'Punjab']})
skip = pd.read_csv(filepath_or_buffer = 'test_scores.csv', sep = ',', header =
0)
skip
OUTPUT:- id first_name scores state 0 1.0 Sachin 150.0 Mumbai 1 2.0 Dravid 210.0 Karnataka 2 NaN NaN NaN NaN 3 4.0 Virat 125.0 Delhi 4 5.0 Yuvraj 75.0 Punjab
ISSUE: The skip_blank_lines is not working.
Upvotes: 0
Reputation: 33950
skip_blank_lines=True
fixes this.
(no need to manually pass line-numbers of blank lines)
Upvotes: 0
Reputation: 294468
use skiprows=[0, 1, 3]
pd.read_clipboard(
sep=',', skipinitialspace=True, skiprows=[0, 1, 3]
)
Upvotes: 1