Reputation: 63
I am trying to use Scrapy to get the names of all current WWE superstars from the following url: http://www.wwe.com/superstars However, when I run my scraper, it does not return any names. I believe (through attempting the problem with other modules) that the problem is that Scrapy is not finding all of the html elements from the page. I attempted the problem with requests and Beautiful Soup, and when I looked at the html that requests got, it was missing important aspects of the html that I was seeing in my browsers inspector. The html containing the names looks like this:
<div class="superstars--info"> == $0
<span class="superstars--name">name here</span>
</div>
My code is posted below. Is there something that I am doing wrong that is causing this not to work?
import scrapy
class SuperstarSpider(scrapy.Spider):
name = "star_spider"
start_urls = ["http://www.wwe.com/superstars"]
def parse(self, response):
star_selector = '.superstars--info'
for star in response.css(star_selector):
NAME_SELECTOR = 'span ::text'
yield {
'name' : star.css(NAME_SELECTOR).extract_first(),
}
Upvotes: 0
Views: 1892
Reputation: 5107
Sounds like the site has dynamic content which maybe loaded using javascript and/or xhr calls. Look into splash it's a javascript render engine that behaves a lot like phantomjs. If you know how to use docker, splash is super simple to setup. After you have splash setup, you'll have to integrate it with scrapy by using the scrapy-splash plugin.
Upvotes: 2
Reputation: 8077
Since the content is javascript generated, you have two options: use something like selenium
to mimic a browser and parse the html content, or if you can, query an API directly.
In this case, this simple solution works:
import requests
import json
URL = "http://www.wwe.com/api/superstars"
with requests.session() as s:
s.headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0'}
resp = s.get(URL).json()
for x in resp['talent'][:10]:
print(x['name'])
Output (first 10 records):
Abdullah the Butcher
Adam Bomb
Adam Cole
Adam Rose
Aiden English
AJ Lee
AJ Styles
Akam
Akeem
Akira Tozawa
Upvotes: 1