Reputation: 1814
I am trying to download the excel from this page: https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList# and then extract data from the applicable cells.
Here is the code that I am using
import requests, os
os.chdir('Path')
dls = 'https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=ExportToExcel&StartRow=0'
resp = requests.get(dls)
with open('tester.xls', 'wb') as output:
output.write(resp.content)
The download is successful, but the formatting is completely messed up (due to the XML?)
I tried changing the file type but it did not help.
Any help is greatly appreciated!
Upvotes: 0
Views: 39
Reputation: 11515
import pandas as pd
df = pd.read_html(
"https://webgate.ec.europa.eu/rasff-window/portal/index.cfm?event=notificationsList")[0]
df.drop(df.columns[-1], axis=1, inplace=True)
print(df)
df.to_csv("data.csv", index=False)
Output: view-online
Upvotes: 1