noggy
noggy

Reputation: 149

I am confused why this XPath selector does not work

I am learning to use scrapy and playing with XPath selectors, and decided to practice by scraping job titles from craigslist.

Here is the html of a single job link from the craigslist page I am trying to scrape the job titles from:

<a href="https://orangecounty.craigslist.org/sof/d/trabuco-canyon-full-stack-net-developer/7134827958.html" data-id="7134827958" class="result-title hdrlnk">Full Stack .NET C# Developer (Mid-Level, Senior) ***LOCAL ONLY***</a>

What I wanted to do was retrieve all of the similar a tags with the class result-title, so I used the XPath selector:

titles = response.xpath('//a[@class="result-title"/text()]').getall()

but the output I receive is an empty list: []

I was able to copy the XPath directly from Chrome's inspector, which ended up working perfectly and gave me a full list of job title names. This selector was:

titles = response.xpath('*//div[@id="sortable-results"]/ul/li/p/a/text()').getall()

I can see why this second XPath selector works, but I don't understand why my first attempt did not work. Can someone explain to me why my first XPath selector failed? I have also provided a link to the full html for the craigslist page below if that is helpful/neccessary. I am new to scrapy and want to learn from my mistakes. Thank you!

view-source:https://orangecounty.craigslist.org/search/sof

Upvotes: 1

Views: 305

Answers (2)

Marsu
Marsu

Reputation: 786

Simply '//a[@class="result-title hdrlnk"]/text()'

Needed 2 fixes:

  • /text() outside of []
  • "result-title hdrlnk" not only "result-title" in attribute selection because XPath is XML parsing not CSS; so exact attribute content is needed to match.

Upvotes: -1

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 185116

Like this:

'//a[contains(@class,"result-title ")]/text()'

Or:

'//a[starts-with(@class,"result-title ")]/text()'

I use contains() or starts-with() because the class of the a node is

result-title hdrlnk

not just

result-title

In your XPath:

'//a[@class="result-title"/text()]'

even if the class was result-title, the syntax is wrong, you should use:

'//a[@class="result-title"]/text()'

Upvotes: 3

Related Questions