Removing UTF 8 encoding in python

Question

I tried scraping the webpage for Passengers & Cargo data. I couldn't convert them into normal data, and web encoding seems to be the challenge.

The Code I used is:

from __future__ import print_function
import requests
import pandas as pd
from bs4 import BeautifulSoup
import urllib 
url = "https://www.faa.gov/data_research/passengers_cargo/unruly_passengers/"
r = requests.get(url)
soup = BeautifulSoup(r.content)
links = soup.find_all("tbody")

for link in links:
    print(link.text)

Output1

This prints in the format Year and Total. But when I append it to a list, the encoding ruins the data. You can see that in Output1

names = []
for link in links:
    names.append(link.text)
names = map(lambda x: x.strip().encode('ascii'), names)
print(names)

Output2

The desired output should be Years and Total for me to perform analyses

haoyu cai · Accepted Answer

You can use find_all tr and td like this:

import requests
from bs4 import BeautifulSoup
import urllib 
url = "https://www.faa.gov/data_research/passengers_cargo/unruly_passengers/"
r = requests.get(url)
soup = BeautifulSoup(r.content)
links = soup.find_all("tr")

data = []
for link in links:
    tds = link.find_all('td')
    if tds:
        data.append({'year':tds[0].text,'total':tds[1].text})

print(data)

It's worked.

Hope it helps you

Removing UTF 8 encoding in python

Answers (1)

Related Questions