Reputation: 433
I'm new to web scraping so I'm fooling around with scrapy and trying to crawl a certain website.
I'm working with the scrapy shell on windows and just trying to establish the proper XPath to a particular element I want to access. The element is a schedule, this is the HTML:
I'm trying to access the rv-schedule-module and all its sub-nodes. I'm able to access all nodes up until the rv-schedule-module however beyond that all XPath calls return null. For instance:
The progression of calls returns data until I want to access a div underneath the rv-schedule-module. That call returns null.
What am I doing wrong?
Upvotes: 1
Views: 300
Reputation: 5240
Just as I suspected that content is dynamically created because it's handled by javascript!
When you inspect the element it will be there but if you check the page source it won't. Scrapy by itself doesn't handle javascript, you'll need something like scrapy-splash or Selenium.
There is a really good post of the all mighty Alex explaining how to use it - https://stackoverflow.com/a/30378765/2781701
Upvotes: 2