Antonio Serrano
Antonio Serrano

Reputation: 942

using Pandas to download/load xls from URL file

I am trying to load the Excel file from the following URL into a dataframe using Python 3.5 and Pandas:

link = "https://hub.coursera-notebooks.org/user/ejquqxfjajkufidbixxvkx/notebooks/Energy%20Indicators.xls"

First I tried to download the file manually using urllib.request in order to read it right after:

import urllib.request
urllib.request.urlretrieve (link, "Energy Indicators.xls")

I got the file "Energy Indicators.xls", yes, but it is not a valid xls file. It seems more like a html file with the extension changed to xls.

Then I tried to load the file directly using read_csv:

energy = pd.read_csv(link, skiprows = 16, header = 0, skipfooter = 38)

But I got a traceback error: "pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 12, saw 2". If I tried to read it without the arguments skiprows, header, etc. I got another error: "ValueError: Expected 1 fields in line 41, saw 3".

Any idea? BTW, I am using Mac OS Sierra and PyCharm Community Edition 2016.3

Upvotes: 1

Views: 3777

Answers (1)

Eduard3192993
Eduard3192993

Reputation: 226

For this specific Coursera exercise, and not as a general case, you can use not the whole URL in read_excel function, but just 'Energy Indicators.xls'

energy = pd.read_excel('Energy Indicators.xls',...)

Upvotes: 2

Related Questions