Beautifulsoup : href link is undefined

Question

I want to scrap a website, when I reach any tag the link is "job/undefined", I used post request to fetch data from the page.

Post request with postdata in this code :

from bs4 import BeautifulSoup
import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}

postData = {
  'search': 'search',
  'facets[camp_type]':'day_camp',
  'open[choices-made-content]': 'true'}

url = 'https://www.trustme.work/en'
html_1 = requests.post(url, headers=headers, data=postData)

soup1 = BeautifulSoup(html_1.text, 'lxml')
a = soup1.select('div.MuiGrid-root MuiGrid-grid-xs-12 ')
b = soup1.select('span[class="MuiTypography-root MuiTypography-h2"]')
print('soup:',b)

Sample from the output :


    
    Network and Security engineer

HedgeHog · Accepted Answer

EDIT

Part of content is served dynamically so, you have to fetch the jobs hashid via api and then create the link yourself or use the data from JSON response:

import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}
url = 'https://api.trustme.work/api/job_offers?include=technologies%2Cjob%2Ccompany%2Ccontract_type%2Clevel'
jobs = requests.get(url, headers=headers).json()['included']['jobs']

['https://www.trustme.work/job/' + v['hashid'] for k,v in jobs.items()]

To get the links from each job post change your css selector to select your elements more specific, also try to use static identifiers or HTML structure over classes:

.select('h2 a')

To get a list of all links use a list comprehension:

['https://www.trustme.work' + a.get('href') for a in soup1.select('h2 a')]

Example

from bs4 import BeautifulSoup
import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}

postData = {
 'search': 'search',
 'facets[camp_type]':'day_camp',
 'open[choices-made-content]': 'true'}

url = 'https://www.trustme.work/en'
html_1 = requests.post(url, headers=headers, data=postData)

soup1 = BeautifulSoup(html_1.text, 'lxml')
['https://www.trustme.work' + a.get('href') for a in soup1.select('h2 a')]

Beautifulsoup : href link is undefined

Answers (1)

EDIT

Example

Related Questions