Reputation: 942
I am trying to load the Excel file from the following URL into a dataframe using Python 3.5 and Pandas:
link = "https://hub.coursera-notebooks.org/user/ejquqxfjajkufidbixxvkx/notebooks/Energy%20Indicators.xls"
First I tried to download the file manually using urllib.request in order to read it right after:
import urllib.request
urllib.request.urlretrieve (link, "Energy Indicators.xls")
I got the file "Energy Indicators.xls", yes, but it is not a valid xls file. It seems more like a html file with the extension changed to xls.
Then I tried to load the file directly using read_csv:
energy = pd.read_csv(link, skiprows = 16, header = 0, skipfooter = 38)
But I got a traceback error: "pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 12, saw 2". If I tried to read it without the arguments skiprows, header, etc. I got another error: "ValueError: Expected 1 fields in line 41, saw 3".
Any idea? BTW, I am using Mac OS Sierra and PyCharm Community Edition 2016.3
Upvotes: 1
Views: 3777
Reputation: 226
For this specific Coursera exercise, and not as a general case, you can use not the whole URL in read_excel function, but just 'Energy Indicators.xls'
energy = pd.read_excel('Energy Indicators.xls',...)
Upvotes: 2