TJ1
TJ1

Reputation: 8488

Python Scrapy: finding a text in an "href"

I am using Python 3 and Scrapy. This is part of my HTML:

<div class="class=a1">
  <span class="a-small">TEXT <a class="a-nm" href="/a/b=data1?ie=UTF8&amp;what-i-want=Nice+Home&amp;the-data=correct&amp;text=ABA+DNA&amp;sort=yes">That's Correct
  </span>
</div>

In the href there is this text: what-i-want. I would like to find the Nice+Home, which is anything after what-i-want= and before the &amp; in the href.

I tried this to first extract the href:

the_href = response.xpath('//a[contains(@href, "what-i-want")]/@href')

I expected it to return

/a/b=data?ie=UTF8&amp;what-i-want=Nice+Home&amp;the-data=correct&amp;text=ABA+DNA&amp;sort=yes

so I then can extract the Nice+Home from it, but it doesn't work.

How can I do this?

update

this is what I see at the_href output:

[<Selector xpath='//a[contains(@href, "what-i-want")]/@href' data='/a/b=data1?ie=UTF8&t'>, 
<Selector xpath='//a[contains(@href, "what-i-want")]/@href' data='/a/b=data2?ie=UTF8&t'>, 
<Selector xpath='//a[contains(@href, "what-i-want")]/@href' data='/a/b=data3?ie=UTF8&t'>, 
<Selector xpath='//a[contains(@href, "what-i-want")]/@href' data='/a/b=data4?ie=UTF8&t'>]

Upvotes: 1

Views: 987

Answers (1)

Andersson
Andersson

Reputation: 52665

response.xpath('//a[contains(@href, "what-i-want")]') should return you the list of link nodes. If you want to get list of hyper-reference attributes, try

the_href = response.xpath('//a[contains(@href, "what-i-want")]/@href').extract()

Then you can extract required values as:

for href in the_href:
    print(href.split("what-i-want=")[-1].split("&amp")[0])

Upvotes: 2

Related Questions