python BeautifulSoup soup.findAll(), how to make search result match

Question

I recent learned BeautifulSoup and as an exercise , I want to use BeautifulSoup to read and extract company and location information from job posting.here is my code:

import urllib
from BeautifulSoup import *

url="http://www.indeed.com/jobs?q=hadoop&start=50"
html=urllib.urlopen(url).read()
soup=BeautifulSoup(html)
company=soup.findAll("span",{"class":"company"})
location=soup.findAll("span",{"class":"location"})

# for c in company:
#   print c.text
# print 
# for l in location:
#   print l.text

print len(company)
print len(location)

I found the length of company and location are not same. So I don't know which (company, location) pair is incomplete. How can I make them match?

alecxe · Accepted Answer

You need to iterate over search results block and get the company-location pairs for each block:

for result in soup.find_all("div", {"class": "result"}):  # or soup.select("div.result")
    company = result.find("span", {"class": "company"}).get_text(strip=True)
    location = result.find("span", {"class": "location"}).get_text(strip=True)

    print(company, location)

You should also switch to BeautifulSoup4, the version you are using is quite old:

pip install beautifulsoup4

And replace:

from BeautifulSoup import *

with:

from bs4 import BeautifulSoup

The code above prints:

(u'PsiNapse', u'San Mateo, CA')
(u'Videology', u'Baltimore, MD')
(u'Charles Schwab', u'Lone Tree, CO')
(u'Cognizant', u'Dover, NH')
...
(u'Concur', u'Bellevue, WA')

python BeautifulSoup soup.findAll(), how to make search result match

Answers (1)

Related Questions