Manoj Rammoorthy
Manoj Rammoorthy

Reputation: 1430

Scraping a table, how to fetch td level data based on raw_input

from bs4 import BeautifulSoup
import urllib2

url = "en.wikipedia.org/wiki/ISO_3166-1"
r = urllib2.urlopen("http://" +url)
soup = BeautifulSoup(r)

#tables = soup.findAll("table")
#i want to fetch data of india and store in a variable
t = soup.find("table")
for t1 in t.find_all('tr'):
  #for cell in t1.find_all('td'):
  cell = t1.find_all('td')
  shortname = cell[0].string
  alpha2 = cell[1].a.string
  #print cell.find_all(text=True)
  print shortname
  #cells = t.find_all('td',text="India")
  #rn = cells[0].get_text()
  #print cells
  #soup.find_all('a')
  #title = soup.a
  #title

Here the comments show the different things I tried before getting data. In the wiki table we have data such as country name and specific codes of country, I want to fetch the codes of the country based on the user input.

Upvotes: 0

Views: 232

Answers (2)

Stephen Lin
Stephen Lin

Reputation: 4912

Using HTMLParser, you can get anything you want from HTML page. Here is your answer.

from HTMLParser import HTMLParser
import requests
import re

class MyHTMLParser(HTMLParser):

    data = []

    def handle_data(self, data):
        if re.findall('[a-zA-Z-:]', data):
            self.data.append(data)

if __name__ == '__main__':        

    url = 'http://en.wikipedia.org/wiki/ISO_3166-1'
    rsp = requests.get(url)

    p = MyHTMLParser()

    p.feed(rsp.text)

    s = p.data[p.data.index('Afghanistan'):p.data.index('ISO 3166-2:ZW')+1]

    name = raw_input('please input country name: ')
    print s[s.index(name)+3] 

Upvotes: 0

mnjeremiah
mnjeremiah

Reputation: 281

This would take user input, ask for the country they want to look up the code for, and then return the 3 digit code. If you enter something it can't find, it would return none.

import requests
from bs4 import BeautifulSoup
session = requests.session()


def fetchCode(country):
    page = session.get('http://en.wikipedia.org/wiki/ISO_3166-1')
    soup = BeautifulSoup(page.text).find('table', {'class': 'wikitable'})
    tablerows = soup.findAll('tr')
    for tr in tablerows:
        td = tr.findAll('td')
        if td:
            if td[0].text.lower() == country.lower():
                return td[3].text



print fetchCode(raw_input('Enter Country Name:'))

Upvotes: 1

Related Questions