Extracting data in table using BeautifulSoup

Question

I'm scraping this page for my android app. I'd like to extract the data on the table of cities and area codes

Here's my code:

from bs4 import BeautifulSoup
import urllib2
import re

base_url = "http://www.howtocallabroad.com/taiwan/"
html_page = urllib2.urlopen(base_url)
soup = BeautifulSoup(html_page)
codes = soup.select("#codes tbody > tr > td")
for area_code in codes:
    # print td city and area code

I'd like to know what function in python or in BeautifulSoup to get the values from value

Sorry just an android dev learning to write python

TerryA · Accepted Answer

You can use findAll(), along with a function which breaks up a list into chunks

>>> areatable = soup.find('table',{'id':'codes'})
>>> d = {}
>>> def chunks(l, n):
...     return [l[i:i+n] for i in range(0, len(l), n)]
>>> dict(chunks([i.text for i in areatable.findAll('td')], 2))
{u'Chunan': u'36', u'Penghu': u'69', u'Wufeng': u'4', u'Fengyuan': u'4', u'Kaohsiung': u'7', u'Changhua': u'47', u'Pingtung': u'8', u'Keelung': u'2', u'Hsinying': u'66', u'Chungli': u'34', u'Suao': u'39', u'Yuanlin': u'48', u'Yungching': u'48', u'Panchiao': u'2', u'Taipei': u'2', u'Tainan': u'62', u'Peikang': u'5', u'Taichung': u'4', u'Yungho': u'2', u'Hsinchu': u'35', u'Tsoying': u'7', u'Hualien': u'38', u'Lukang': u'47', u'Talin': u'5', u'Chiaochi': u'39', u'Fengshan': u'7', u'Sanchung': u'2', u'Tungkang': u'88', u'Taoyuan': u'33', u'Hukou': u'36'}

Explanation:

.find() finds a table with an id of codes. The chunks function is used to split up a list into evenly sized chunks.

As findAll returns a list, we use chunks on the list to create something like:

[[u'Changhua', u'47'], [u'Keelung', u'2'], etc]

i.text for i in... is used to get the text of each td tag, otherwise the and would remain.

Finally, dict() is called to convert the list of lists into a dictionary, which you can use to access the country's area code.

Extracting data in table using BeautifulSoup

Answers (1)

Explanation:

Related Questions