Reputation: 97
I'm trying to get the URL, or href, from a webpage using web scraping, specifically using Scrapy. However, it returns an empty list when I response.xpath('XPATH').extract() the href link. The HTML page structure is:
The specific HTML element href I'm trying to get is:
<a href="#2020-38970" class="redNoticeItem__labelLink" data-singleurl="https://ws-public.interpol.int/notices/v1/red/2020-38970">MAGOMEDOVA<br>MADINA</a>
The result of the xpath command is:
For context, I'm trying to get the information in each person's URL and extract it, but I'm unable to retrieve the href from the web page.
I copied the full xpath of the HTML element, and it's: /html/body/div1/div1/div[6]/div/div2/div/div2/div2/div/div2/div/div/div2/div1/a.
But this still returns [] when I run response xpath command.
Upvotes: 0
Views: 2063
Reputation: 1484
You can simply use response.xpath ("//a[@class='redNoticeItem__labelLink']").extract()
Upvotes: 0
Reputation: 1669
In this situation I personally wouldn't use xpath. I wouldn't even use Scrapy. In this situation I believe the simplest solution would be to instead use BeautifulSoup and requests together.
import BeautifulSoup as bs4
import requests
url=YOUR_URL_HERE
soup=BeautifulSoup(requests.get(url).text)
links=soup.find_all('a')
urls=[x['href'] for x in links]
This code will give you the href of every link on the page in a list, and you can filter the list further by the class or whatever you need.
Upvotes: 2