Reputation: 7
I tried scraping the webpage for Passengers & Cargo data. I couldn't convert them into normal data, and web encoding seems to be the challenge.
The Code I used is:
from __future__ import print_function
import requests
import pandas as pd
from bs4 import BeautifulSoup
import urllib
url = "https://www.faa.gov/data_research/passengers_cargo/unruly_passengers/"
r = requests.get(url)
soup = BeautifulSoup(r.content)
links = soup.find_all("tbody")
for link in links:
print(link.text)
Output1
This prints in the format Year and Total. But when I append it to a list, the encoding ruins the data. You can see that in Output1
names = []
for link in links:
names.append(link.text)
names = map(lambda x: x.strip().encode('ascii'), names)
print(names)
Output2
The desired output should be Years and Total for me to perform analyses
Upvotes: 0
Views: 88
Reputation: 279
You can use find_all tr
and td
like this:
import requests
from bs4 import BeautifulSoup
import urllib
url = "https://www.faa.gov/data_research/passengers_cargo/unruly_passengers/"
r = requests.get(url)
soup = BeautifulSoup(r.content)
links = soup.find_all("tr")
data = []
for link in links:
tds = link.find_all('td')
if tds:
data.append({'year':tds[0].text,'total':tds[1].text})
print(data)
It's worked.
Hope it helps you
Upvotes: 1