ka4c
ka4c

Reputation: 99

Convert table sourced from html webpage in to pandas dataframe

I'm trying to obtain a table from a webpage and convert in to a dataframe to be used in analysis. I've used the BeautifulSoup package to scrape the url and parse the table info, but I can't seem to export the info to a dataframe. My code is below:

from bs4 import BeautifulSoup as bs
from urllib import request

source = urllib.request.urlopen("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").read()
soup = bs(source, "html.parser")

table = soup.table

table_rows = table.find_all("tr")

for tr in table_rows:
    td = tr.find_all("td")
    row = [i.text for i in td]
    print(row)

By doing this I can see each row, but I'm not sure how to convert it to df. Any ideas?

Upvotes: 0

Views: 170

Answers (2)

sushanth
sushanth

Reputation: 8302

u can utilize pandas read_html

# read's all the tables & return as an array, pick the data table that meets your need

table_list = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")

print(table_list[0])

  Postal Code           Borough               Neighborhood
0         M1A      Not assigned                        NaN
1         M2A      Not assigned                        NaN
2         M3A        North York                  Parkwoods
3         M4A        North York           Victoria Village

Upvotes: 0

aruN
aruN

Reputation: 161

please try this.

from bs4 import BeautifulSoup as bs
from urllib.request import urlopen
import pandas as pd

source = urlopen("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").read()
soup = bs(source, "html.parser")

table = soup.table

table_rows = table.find_all("tr")

postal_codes = []

for tr in table_rows:
    td = tr.find_all("td")
    row = [ i.text[:-1] for i in td]
    postal_codes.append(row)
    #print(row)

postal_codes.pop(0)

df = pd.DataFrame(postal_codes, columns=['PostalCode', 'Borough', 'Neighborhood'])

print(df)

Upvotes: 1

Related Questions