Reputation: 15
I'm looking to scrape just the job description from this page: https://www.aha.io/company/careers/current-openings/customer_success_specialist_project_management_us
I'd like to get all of the text and HTML inside the div with the class of "container py2 content job"
, EXCEPT the button. It's in an <a>
tag with the class of "btn btn-large btn-secondary"
.
I've got two different xpath selectors that I thought should work, but don't. The first doesn't exclude the button and the second gets rid of all of the other HTML, which I'd like to keep.
response.xpath('//div[@class ="container py2 content job"]
[not(parent::a/@class="btn btn-large btn-secondary")]').extract()
response.xpath('//div[@class ="container py2 content
job"]/descendant::text()[not(parent::a/@class="btn btn-large btn-
secondary")]').extract()
Neither is scraping all of the HTML in the div minus what's inside the a tag. I'm hoping there's something simple that I'm missing, but I can't find what I'm looking for in the documentation.
Upvotes: 0
Views: 731
Reputation: 1035
job_html = response.css('div.content *').extract()
job_html = [x for x in job_html if "Apply now" not in x]
print(job_html)
Upvotes: 1