Reputation: 195
I am very new to scrapy and am trying to scrape posts on reddit. To help, I have accessed the scrapy shell and am trying to dig out the posts. The page I am using is https://www.reddit.com/r/news/comments/6a4ie8/philippines_senator_tells_un_reports_of_drug_war/
I have viewed the source and have found the following data I want to access:
"class="usertext-body may-blank-within md-container ">< div class="md" >< p >It seems to me that the senator was using the term "alternative facts" the opposite way Conway used them. He used them to discred e.t.c"
Why when I type in response.xpath('//div[@class="md"]).extract() I get an empty array. Furthermore I get empty arrays when trying to access a lot of the data on this page through the shell.
Many thanks in advance
Upvotes: 1
Views: 161
Reputation: 2061
If you want to access to the text of every post, you can use this xpath:
response.xpath('//form[contains(@id, "form-t1")]//div//div//p/text()').extract()
.
You can learn more about xpaths here: Scrapy Selectors
Finally, here is a very usefull tool if you want to test xpaths: Videlibri. In the left textarea you paste the HTML you want to parse, in the right one you paste your xpath. You now can easilly test your code.
Hope this helps.
Upvotes: 0
Reputation: 1947
Try this using both response.css
and response.xpath
, avoid using form
id as it seems to change:
>>> response.css('div.entry form div.usertext-body div.md p ::text').extract_first()
'It seems to me that the senator was using the term "alternative facts" the opposite way Conway used them. He used them to discredit the interpretation of said "facts" as lies, insisting that many of the homicides being counted as extra-judicial killings were just regular homicides.'
>>>
>>> response.xpath("//div[contains(@class, 'entry')]/form/div/div/p[1]/text()").extract_first()
'It seems to me that the senator was using the term "alternative facts" the opposite way Conway used them. He used them to discredit the interpretation of said "facts" as lies, insisting that many of the homicides being counted as extra-judicial killings were just regular homicides.'
Upvotes: 0