j riot
j riot

Reputation: 544

Python Pandas read_csv not importing correctly

I have a .xls file that looks similar to this...

Value of Construction Put in Place...
(Millions of Dollars....)
Blank Row
Date    Total_Construction Total Residential Total Nonresidential...Columns 
Dec-15  1,116,570          435,454           681,217 
Nov-15  1,115,966          432,295           683,671
Oct-15  1,122,749          431,164           691,585   
.
.
.

I am trying to import the file to get the following:

Date    Total_Construction Total Residential Total Nonresidential 
Dec-15  1,116,570          435,454           681,217 
Nov-15  1,115,966          432,295           683,671
Oct-15  1,122,749          431,164           691,585   
.
.
. 

With the following code:

for chunk in pandas.read_csv('/PATH/totsatime.xls',
                 names      = ['Date', 'Total Residential', 'Total Nonresidential'],
                 header     = 4,
                 chunksize  = 1,
                 skiprows   = range(1, 4),
                 thousands  = ','):

    if chunk['Date'] == 'Dec-01':
        break

    else:
        df = pandas.DataFrame(chunk)

However, I end up with the following:

Date             Total Residential     Total Nonresidential
Lodging          NaN                   NaN
Office          NaN                   NaN
Commercial      NaN                   NaN
Health care     NaN                   NaN

The dates end up being formatted from the columns that I am not importing. Any advice would greatly be appreciated.

Thank you in advance.

Upvotes: 0

Views: 820

Answers (1)

mh00h
mh00h

Reputation: 1856

Don't use read_csv to import an xls file. Use read_excel. See http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

Upvotes: 5

Related Questions