Christian Read
Christian Read

Reputation: 143

How to extract text in a expand more button using scrapy?

In URL: https://teslamotorsclub.com/tmc/threads/tesla-tsla-the-investment-world-the-2019-investors-roundtable.139047/page-2619

Post #52365

Before I get the text I need to click the "expand more", how can I get the text inside it? Is there a way wherein I can trigger the expand more to show the whole while running the spider script?

What I have tried so far is this

info.xpath(".//div[@class = 'messageContent']").extract_first().replace('\n', '')

But still I cannot get the whole text

Upvotes: 0

Views: 928

Answers (2)

Granitosaurus
Granitosaurus

Reputation: 21436

As someone pointed out in the comments you don't need to click anything. If you open document inspector in your browser you can see that all of the text is there.

You can retrieve all of the messages with simple css selectors and a for loop:

for post in sel.css('.messageList>li'): 
    text = ''.join(post.css('blockquote.messageText ::text').extract()) 
    print(text) 
    print('------')

Upvotes: 0

Gallaecio
Gallaecio

Reputation: 3857

You are probably seeing the "Click to expand" text at the end, but still getting the whole quote. What you need is to avoid extracting the "Click to expand" text.

For example:

>>> response.xpath('//li[contains(@class, "message")][.//a/text()[.="#52365"]]//*[re:test(@class, "\\bquote\\b")]//text()').getall()
['CCS for model 3 coming', '\nWhile article references Europe, the North American theater will be getting a CCS adapter soon.', '\nSee article for', '\n', '\n', 'Tesla launches $190 CCS adapter for new Model S and Model X, offers retrofits for older vehicles', '\n', '\nMartian High Command', '\n', '\nPS: Text from article.', '\n', '\nUpdate: A Tesla spokesperson told us that they will make sure owners in North America will have access to all “compelling networks”, but they have nothing to announce now.']

Upvotes: 1

Related Questions