Reputation: 15
I'm a retired programmer but new to scrapy. Actually, this is my first python project so I could be doing anything wrong.
I brought up scrapy under anaconda and started a shell with :
scrapy shell "https://sailing-channels.com/by-subscribers"
Looks like everything is working fine and I can get some querys to work.
Here is my problem: when I enter :
response.css('body').extract()
I get:['<body><noscript>If you\'re seeing this message, that means <strong>JavaScript has been disabled on your browser</strong>, please <strong>enable JS</strong> to make this app work.</noscript><div id="app"></div><script src="//apis.google.com/js/platform.js" async></script><script>!function(e,a,n,t,g,c,i){e.GoogleAnalyticsObject="ga",e.ga=e.ga||function(){(e.ga.q=e.ga.q||[]).push(arguments)},e.ga.l=1*new Date,c=a.createElement(n),i=a.getElementsByTagName(n)[0],c.async=1,c.src="//www.google-analytics.com/analytics.js",i.parentNode.insertBefore(c,i)}(window,document,"script"),ga("create","UA-15981085-17","auto"),ga("require","linkid"),ga("set","anonymizeIp",!0),ga("send","pageview")</script><script type="application/ld+json">{\n\t\t\t"@context": "http://schema.org",\n\t\t\t"@type": "Organization",\n\t\t\t"name": "Sailing Channels"\n\t\t\t"url": "https://www.sailing-channels.com",\n\t\t\t"logo": "https://sailing-channels.com/img/banner.png",\n\t\t\t"sameAs" : [\n\t\t\t\t"https://www.facebook.com/sailingchannels",\n\t\t\t\t"https://twitter.com/sailchannels"\n\t\t\t]\n\t }</script><script type="text/javascript" src="https://cdn.sailing-channels.com/1.15.9/main.1dad65fcb7a507930e1f.js"></script></body>']
My problem is I expect a lot more. When I do an inspect on chrome I see a lot more /div sections inside <div id="app"></div>
Could someone shine some light on what I'm doing wrong? I want to scrape the channel name, subscriber count, and views
Thanks
Upvotes: 1
Views: 303
Reputation: 939
Understandable. It is because of they rendering the data through another script during loading of the page.
In normal scrapy setting, dynamic page loading content doesn't appear. For scraping that data you can use selenium
.
selenium-with-scrapy-for-dynamic-page
For an alternative way, you can use splash
for handling javascript enabled content.
handling-javascript-in-scrapy-with-splash
Upvotes: 2