Reputation: 8297
import requests
import csv
import requests
from bs4 import BeautifulSoup
r = requests.get('https://pqt.cbp.gov/report/YYZ_1/12-01-2017')
soup = BeautifulSoup(r)
table = soup.find('table', attrs={ "class" : "table-horizontal-line"})
headers = [header.text for header in table.find_all('th')]
rows = []
for row in table.find_all('tr'):
rows.append([val.text.encode('utf8') for val in row.find_all('td')])
with open('output_file.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(row for row in rows if row)
I am trying to parse all table data in this particular webpage: https://pqt.cbp.gov/report/YYZ_1/12-01-2017
I am getting an error in the line soup = BeautifulSoup(r)
. I get an error TypeError: object of type 'Response' has no len()
. I am also not sure if my logic is correct. Plz help me pasing the table data.
Upvotes: 0
Views: 53
Reputation: 84475
I would do it this way
import pandas as pd
result = pd.read_html("https://pqt.cbp.gov/report/YYZ_1/12-01-2017")
df = result[0]
# df = df.drop(labels='Unnamed: 8', axis=1)
df.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8',index = False )
Upvotes: 1
Reputation: 19184
variable r
is type Response
not str
, use r.text
or r.content
and there are no table with class table-horizontal-line
, do you mean results
?
soup = BeautifulSoup(r.text)
table = soup.find('table', attrs={"class" : "results"})
Upvotes: 0
Reputation: 42050
try:
r = requests.get('https://pqt.cbp.gov/report/YYZ_1/12-01-2017')
soup = BeautifulSoup(r.content)
Upvotes: 0