Reputation: 99
I'm trying to obtain a table from a webpage and convert in to a dataframe to be used in analysis. I've used the BeautifulSoup package to scrape the url and parse the table info, but I can't seem to export the info to a dataframe. My code is below:
from bs4 import BeautifulSoup as bs
from urllib import request
source = urllib.request.urlopen("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").read()
soup = bs(source, "html.parser")
table = soup.table
table_rows = table.find_all("tr")
for tr in table_rows:
td = tr.find_all("td")
row = [i.text for i in td]
print(row)
By doing this I can see each row, but I'm not sure how to convert it to df. Any ideas?
Upvotes: 0
Views: 170
Reputation: 8302
u can utilize pandas read_html
# read's all the tables & return as an array, pick the data table that meets your need
table_list = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
print(table_list[0])
Postal Code Borough Neighborhood
0 M1A Not assigned NaN
1 M2A Not assigned NaN
2 M3A North York Parkwoods
3 M4A North York Victoria Village
Upvotes: 0
Reputation: 161
please try this.
from bs4 import BeautifulSoup as bs
from urllib.request import urlopen
import pandas as pd
source = urlopen("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").read()
soup = bs(source, "html.parser")
table = soup.table
table_rows = table.find_all("tr")
postal_codes = []
for tr in table_rows:
td = tr.find_all("td")
row = [ i.text[:-1] for i in td]
postal_codes.append(row)
#print(row)
postal_codes.pop(0)
df = pd.DataFrame(postal_codes, columns=['PostalCode', 'Borough', 'Neighborhood'])
print(df)
Upvotes: 1