Scrape span title

Question

I'm new in web scraping I'm trying to scrape indeed for practice. But I encounter a problem, I want to scrape job title only but it scrape all the span including the "new". Below is my code

from bs4 import BeautifulSoup as bs
import requests

def extract(page):

  url = f'https://ph.indeed.com/jobs?q=python+developer&l=Manila&start={page}'
  r = requests.get(url)
  soup = bs(r.content,'lxml')
  return soup

def transform(soup):
  results = soup.find_all('div',class_='slider_container')
  for item in results:
    job_title=item.find('span').text
    print(job_title)
c = extract(0)
transform(c)

When I run the code the result is:

new
new
Python Developer
Python Developer
new
Jr. Python Developer
Python Developer
Python Developer
Python Developer
new
new
Junior Web Developer (Web Scraping)
new
Junior Web Developer Fullstack
Back End Developer (Work-from-Home)

Expected out put should be the job title only not including 'new':

Python Developer
Python Developer
Jr. Python Developer
Python Developer
Python Developer
Python Developer
Junior Web Developer (Web Scraping)
Junior Web Developer Fullstack
Back End Developer (Work-from-Home)

AziMez · Accepted Answer

You can use an if condition to exclude the word 'new'.

Try this one:

from bs4 import BeautifulSoup as bs
import requests

def extract(page):

  url = f'https://ph.indeed.com/jobs?q=python+developer&l=Manila&start={page}'
  r = requests.get(url)
  soup = bs(r.content,'lxml')
  return soup

def transform(soup):
  results = soup.find_all('div',class_='slider_container')
  for item in results:
    job_title=item.find('span').text
    if job_title !='new': # <<<----- Edited line here!
        print(job_title)
c = extract(0)
transform(c)

Scrape span title

Answers (2)

Related Questions