New to Web Scraping

Question

Trying to teach myself some web scraping, just for fun. Decided to use it to look at a list of jobs posted on a website. I've gotten stuck. I want to be able to pull all the jobs listed on this page, but can't seem to get it to recognize anything deeper in the container I've made. Any suggestions are more than appreciated.

Current Code:

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

myURL = 'https://jobs.collinsaerospace.com/search-jobs/'

uClient = uReq(myURL)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")
container = page_soup.findAll("section", {"id":"search-results-list"})
container

Sample of the container:


 
 
 
 Test Technician
 Melbourne, Florida
 06/27/2019
 
 
 
 
 Associate Systems Engineer
 Cedar Rapids, Iowa
 06/27/2019

I'm trying to understand how to actually extract the h2 level information (or really any information within the container I currently created)

Vim · Accepted Answer

I have tried to replicate the same using lxml.

import requests
from lxml import html
resp = requests.get('https://jobs.collinsaerospace.com/search-jobs/')
data_root = html.fromstring(resp.content)

data = []
for node in data_root.xpath('//section[@id="search-results-list"]/ul/li'):
    data.append({"url":node.xpath('a/@href')[0],"name":node.xpath('a/h2/text()')[0],"location":node.xpath('a/span[@class="job-location"]/text()')[0],"posted":node.xpath('a/span[@class="job-date-posted"]/text()')[0]})
print(data)

New to Web Scraping

Answers (2)

Related Questions