liveinfootball
liveinfootball

Reputation: 29

Scrape dynamic webpage for data using scrapy

I am trying to scrape some data from www.gbig.org. It was successful to scrape the Activity summary, Activity details and Why it's green section however, the response was empty when I crawl the LEED DASHBOARD section.

Specifically, I would like to get 6 scores, EA, MR, IEQ, SS, WE, and ID but these values are empty when I scrape with this xpath.

enter image description here

scrapy shell "http://www.gbig.org/activities/leed-1000020523"
response.xpath("//*[@id='overview']/div[1]/div[1]/div/div[2]/div[2]/div[1]/div[1]/div/div/p[1]/text()").extract()

I found it is because the values that want to scrape are dynamic values, but I have no idea how to get the values. Could you please guide me to obtain these?

Upvotes: 0

Views: 164

Answers (1)

Akshay Goyal
Akshay Goyal

Reputation: 71

First of all this website is pretty slow and you need to increase wait time while crawling using scrapy.

There are few things you can experiment with to get this data you are looking for.

  1. Experiment with increasing wait time
  2. Crawl this website using splash docker for headless browser loading. This way you can load js files so, you will probably get the data you are looking for. Currently, you are crawling in scrapy shell which will give you basic HTML without any loading of js and CSS coming from your target website which may not contain all data.

Hopefully, this can solve your problem.

Upvotes: 1

Related Questions