Dan
Dan

Reputation: 33

JSON data webscraping

I am attempting to scrape job titles from here.

Using Beautifulsoup I can scrape Job Titles from the first page. I am not able to scrape Job titles from the remaining pages. Using the Developertool > network I understood content type is JSON.

import requests
import json
import BeautifulSoup
from os import link
import pandas as pd
s = requests.Session()
headers = {
    'Connection': 'keep-alive',
    'sec-ch-ua': '^\\^',
    'Accept': '*/*',
    'X-Requested-With': 'XMLHttpRequest',
    'sec-ch-ua-mobile': '?0',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36',
    'Content-Type': 'application/json; charset=utf-8',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Dest': 'empty',
    'Referer': 'https://jobs.epicor.com/search-jobs',
    'Accept-Language': 'en-US,en;q=0.9',
}
url=’https://jobs.epicor.com/search-jobs/results?ActiveFacetID=0&CurrentPage=2&RecordsPerPage=15&Distance=50&RadiusUnitType=0&Keywords=&Location=&ShowRadius=False&IsPagination=False&CustomFacetName=&FacetTerm=&FacetType=0&SearchResultsModuleName=Search+Results&SearchFiltersModuleName=Search+Filters&SortCriteria=0&SortDirection=1&SearchType=5&PostalCode=&fc=&fl=&fcf=&afc=&afl=&afcf=’
response = s.get(url, headers=headers).json()
data=json.dumps(response)
#print(data)
d2=json.loads(data)
for x in d2.keys():
  print(x)
###from above json results how to extract “jobtiltle”

The issue is above result's JSON data contains Html tags. In this case how to scrape job titles from the JSON data?

Would really appreciate any help on this.

I am unfortunately currently limited to using only requests or another popular python library. Thanks in advance.

Upvotes: 1

Views: 62

Answers (1)

crayxt
crayxt

Reputation: 2405

If the job titles is all that you need from your response text:

from bs4 import BeautifulSoup
# your code here
soup = BeautifulSoup(response["results"])
for item in soup.findAll("span", { "class" : "jobtitle" }):
    print(item.text)

To navigate over the pages, if you hover your mouse cursor over the Prev or Next buttons there, you will see the url to request data from.

Upvotes: 1

Related Questions