7edubs7
7edubs7

Reputation: 15

How to skip over child element with Scrapy

I'm looking to scrape just the job description from this page: https://www.aha.io/company/careers/current-openings/customer_success_specialist_project_management_us

I'd like to get all of the text and HTML inside the div with the class of "container py2 content job", EXCEPT the button. It's in an <a> tag with the class of "btn btn-large btn-secondary".

I've got two different xpath selectors that I thought should work, but don't. The first doesn't exclude the button and the second gets rid of all of the other HTML, which I'd like to keep.

response.xpath('//div[@class ="container py2 content job"] 
[not(parent::a/@class="btn btn-large btn-secondary")]').extract()

response.xpath('//div[@class ="container py2 content 
job"]/descendant::text()[not(parent::a/@class="btn btn-large btn- 
secondary")]').extract()

Neither is scraping all of the HTML in the div minus what's inside the a tag. I'm hoping there's something simple that I'm missing, but I can't find what I'm looking for in the documentation.

Upvotes: 0

Views: 731

Answers (1)

ThePyGuy
ThePyGuy

Reputation: 1035

job_html = response.css('div.content *').extract()
job_html = [x for x in job_html if "Apply now" not in x]
print(job_html)

Upvotes: 1

Related Questions