Reputation: 3433
I want to extract all the text in a specific webpage.
In JavaScript the code looks like this:
var webPage = require('webpage');
var page = webPage.create();
page.open('http://phantomjs.org', function (status) {
console.log('Stripped down page text:\n' + page.plainText);
phantom.exit();
});
How can I run page.plainText in Python?
Thanks.
Upvotes: 3
Views: 21226
Reputation: 3471
If you want to do that with Selenium, you have to select the "top" element and after the call to getText()
.
For example, in Python:
driver = webdriver.PhantomJS(executable_path='pathTo/phantomjs')
driver.get('https://en.wikipedia.org/wiki/Selenium_(software)')
el = driver.find_element_by_tag_name('body')
print(el.text)
driver.close()
Upvotes: 10
Reputation: 6459
Try this code:
text = driver.find_element_by_tag_name("body").get_attribute("innerText")
Upvotes: 2