Reputation: 6693
The HTML structure is like this:
<td class='hey'>
<a href="https://example.com">First one</a>
</td>
This is my selector:
m_URL = sel.css("td.hey a:nth-child(1)[href] ").extract()
My selector now will output <a href="https://example.com">First one</a>
, but I only want it to output the link itself: https://example.com
.
How can I do that?
Upvotes: 14
Views: 27202
Reputation: 473753
Get the ::attr(value)
from the a
tag.
Demo (using Scrapy shell):
$ scrapy shell index.html
>>> response.css('td.hey a:nth-child(1)::attr(href)').extract()
[u'https://example.com']
where index.html
contains:
<table>
<tr>
<td class='hey'>
<a href="https://example.com">Fist one</a>
</td>
</tr>
</table>
Upvotes: 26
Reputation: 221
you may try this:
m_URL = sel.css("td.hey a:nth-child(1)").xpath('@href').extract()
Upvotes: 6