Reputation: 4036
When I use driver.page_source I will get full source page, is there any way that I can get specific part of the html code.
from selenium import webdriver
chrome_options = webdriver.ChromeOptions ()
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome (executable_path="/selenium/chromedriver", options=chrome_options)
driver.get("https://news.creaders.net/us/2021/01/27/2315313.html")
content = driver.page_source
Then I will receive the whole page html.
But I only need the html that inside the : <div id="newsContent"> </div>
<div id="newsContent">
<p></p><p>cotent</p><p style="text-align: center;"><img src="https://pub.creaders.net/upload_files/image/202101/20210127_16117914118079.png" title="20210127_16117914118079.png" alt="image.png"></p>
</div>
Upvotes: 0
Views: 208
Reputation: 229
Try running your HTML output through the BeautifulSoup parser.
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
div = soup.find('div', id='newsContent')
print ''.join(map(str, div.contents))
Upvotes: 1