Reputation: 349
I'm testing scrapy and can't figure out how to retrieve plain text without tags in it when it is nested in tags. Here is the URL I test it on: http://www.tripadvisor.com/ShowTopic-g293915-i3686-k8824646-What_s_the_coolest_thing_you_saw_or_did_in_Thailand-Thailand.html
Desired output: content of the posts as separate elements in the item[body] object
My code:
import scrapy
from tripadvisor.items import TripadvisorItem
class TripadvisorSpider(scrapy.Spider):
[...]
def parse_thread_contents(self, response):
url = response.url
item = TripadvisorItem()
for sel in response.xpath('//div[@class="balance"]'):
item['body'] = sel.xpath('//div[@class="postBody"]//p').extract()
yield item
Upvotes: 2
Views: 904
Reputation: 473763
You need to get the text()
of the p
elements. There is also a problem in the loop - you need to iterate over posts one by one and get the post bodies and collect them in a list:
item['body'] = ["".join(post.xpath('.//div[@class="postBody"]/p/text()').extract())
for post in response.xpath('//div[@class="postcontent"]')]
Also note that the dot at the beginning of the expression is also important - it would make the search context-specific.
Demo:
In [1]: for post in response.xpath('//div[@class="postcontent"]'):
...: print("".join(post.xpath('.//div[@class="postBody"]/p/text()').extract()))
...:
What's that memory you'll carry forever with you? Maybe you stayed on a floating hut in Khao Sok Lake, or you washed elephants in a sanctuary, or....I have no idea. Please share if you like, I'd love to hear!
The heat when you you go to for the first time, my blessing ceremony with my husband on Bottle Beach is up there, as is the first time I met him in Samui. Phang Nga Bay on the west coast is stunning and took my breath away, I overnighted on a friend's boat and watched the stars come out. Hong Island was amazing and arriving at Koh Racha before it had hotels on it. Early morning mist on the river at Amphawa whilst looking across to a beautiful temple, the Chao Praya River in Bangkok, the Reclining Buddha at Wat Pho - I could go on and on. : )
First trip to few years back. Not very informed, no smart phone, no google earth....rent a bike, with my wife and we just ride the bike "till the road ends"...ended up at their local uni, watch student going in and out of the uni gate, sat on the road side having a coke. No worries...just me and my wife.Cassnu, pls...go on and on...we dont mind.
...
Upvotes: 1