Reputation: 135
I am quite new to Scrapy but am designing a web scrape to pull certain information from GoFundMe, specifically in this case the amount of people who have donated to a project. I have written an xpath statement which works fine in Chrome but returns null in Scrapy.
A random example project is https://www.gofundme.com/f/passage/donations, which at present has 22 donations. The below when entered in Chrome inspect gives me "Donations(22)" which is what I need -
//h2[@class="heading-5 mb0"]/text()
However in my Scrapy spider the following yields null -
class DonationsSpider(scrapy.Spider):
name = 'get_donations'
start_urls = [
'https://www.gofundme.com/f/passage/donations'
]
def parse(self, response):
amount_of_donations = response.xpath('//h2[@class="heading-5 mb0"]/text()').extract_first()
yield{
'Donations': amount_of_donations
}
Does anyone know why Scrapy is unable to see this value?
I am doing this in an attempt to find out how many times the rest of the spider needs to loop, as when I hard code this value it works with no problems and yields all of the donations.
Upvotes: 0
Views: 113
Reputation: 5491
Well because there are many requests going on the fulfil the request "https://www.gofundme.com/f/passage/donations". Where
your chrome is smart enough to under stand javascript, using that smartness it reads the JavaScript code and fetches all the responses from different different endpoints to fulfil your request
there's one request to the endpoint "https://gateway.gofundme.com/web-gateway/v1/feed/passage/counts" which loads the data you're looking for. which your python script can't do and also it's not recommend.
Instead you can call directly to that api and you'll get the data, good news is that endpoint responds JSON data which is very structured, easy to parse.
and I'm sure you're also looking for the data which is coming from this endpoint "https://gateway.gofundme.com/web-gateway/v1/feed/passage/donations?limit=20&offset=0&sort=recent"
for more information you may refer to one of my blog by clicking here
Upvotes: 1