Reputation: 8488
I am using Python 3 and Scrapy. This is part of my HTML:
<div class="class=a1">
<span class="a-small">TEXT <a class="a-nm" href="/a/b=data1?ie=UTF8&what-i-want=Nice+Home&the-data=correct&text=ABA+DNA&sort=yes">That's Correct
</span>
</div>
In the href
there is this text: what-i-want
. I would like to find the Nice+Home
, which is anything after what-i-want=
and before the &
in the href
.
I tried this to first extract the href
:
the_href = response.xpath('//a[contains(@href, "what-i-want")]/@href')
I expected it to return
/a/b=data?ie=UTF8&what-i-want=Nice+Home&the-data=correct&text=ABA+DNA&sort=yes
so I then can extract the Nice+Home
from it, but it doesn't work.
How can I do this?
this is what I see at the_href
output:
[<Selector xpath='//a[contains(@href, "what-i-want")]/@href' data='/a/b=data1?ie=UTF8&t'>,
<Selector xpath='//a[contains(@href, "what-i-want")]/@href' data='/a/b=data2?ie=UTF8&t'>,
<Selector xpath='//a[contains(@href, "what-i-want")]/@href' data='/a/b=data3?ie=UTF8&t'>,
<Selector xpath='//a[contains(@href, "what-i-want")]/@href' data='/a/b=data4?ie=UTF8&t'>]
Upvotes: 1
Views: 987
Reputation: 52665
response.xpath('//a[contains(@href, "what-i-want")]')
should return you the list of link nodes. If you want to get list of hyper-reference attributes, try
the_href = response.xpath('//a[contains(@href, "what-i-want")]/@href').extract()
Then you can extract required values as:
for href in the_href:
print(href.split("what-i-want=")[-1].split("&")[0])
Upvotes: 2