SecureEntrepeneur
SecureEntrepeneur

Reputation: 97

Python - trying to get URL (href) from web scraping using Scrapy

I'm trying to get the URL, or href, from a webpage using web scraping, specifically using Scrapy. However, it returns an empty list when I response.xpath('XPATH').extract() the href link. The HTML page structure is: inspecting the webpage for HTML structure The specific HTML element href I'm trying to get is: <a href="#2020-38970" class="redNoticeItem__labelLink" data-singleurl="https://ws-public.interpol.int/notices/v1/red/2020-38970">MAGOMEDOVA<br>MADINA</a>

The result of the xpath command is: xpath command result returns empty

For context, I'm trying to get the information in each person's URL and extract it, but I'm unable to retrieve the href from the web page.

I copied the full xpath of the HTML element, and it's: /html/body/div1/div1/div[6]/div/div2/div/div2/div2/div/div2/div/div/div2/div1/a.

But this still returns [] when I run response xpath command.

Upvotes: 0

Views: 2063

Answers (2)

Chaithanya Krishna
Chaithanya Krishna

Reputation: 1484

You can simply use response.xpath ("//a[@class='redNoticeItem__labelLink']").extract()

Upvotes: 0

stidmatt
stidmatt

Reputation: 1669

In this situation I personally wouldn't use xpath. I wouldn't even use Scrapy. In this situation I believe the simplest solution would be to instead use BeautifulSoup and requests together.

import BeautifulSoup as bs4
import requests
url=YOUR_URL_HERE
soup=BeautifulSoup(requests.get(url).text)
links=soup.find_all('a')
urls=[x['href'] for x in links]

This code will give you the href of every link on the page in a list, and you can filter the list further by the class or whatever you need.

Upvotes: 2

Related Questions