Reputation: 348
The webpage shows that there are 702 Comments.
target youtube sample
I write a function get_total_youtube_comments(url)
,many codes copied from the project on github.
def get_total_youtube_comments(url):
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument("--headless")
driver = webdriver.Chrome(options=options,executable_path='/usr/bin/chromedriver')
wait = WebDriverWait(driver,60)
driver.get(url)
SCROLL_PAUSE_TIME = 2
CYCLES = 7
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)
html.send_keys(Keys.PAGE_DOWN)
time.sleep(SCROLL_PAUSE_TIME * 3)
for i in range(CYCLES):
html.send_keys(Keys.END)
time.sleep(SCROLL_PAUSE_TIME)
comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
return all_comments
Try to parse all comments on a sample webpage https://www.youtube.com/watch?v=N0lxfilGfak
.
url='https://www.youtube.com/watch?v=N0lxfilGfak'
list = get_total_youtube_comments(url)
It can get some comments ,only small party of all comments.
len(list)
60
60
is much less than 702
,how to get all comments in youtube with selenium?
@supputuri,i can extract all comments with your code.
comments_list = driver.find_elements_by_xpath("//*[@id='content-text']")
len(comments_list)
709
print(driver.find_element_by_xpath("//h2[@id='count']").text)
717 Comments
comments_list[-1].text
'mistake at 23:11 \nin NOT it should return false if x is true.'
comments_list[0].text
'Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Python Course curriculum, Visit our Website: Use code "YOUTUBE20" to get Flat 20% off on this training.'
Why the comments number is 709 instead of 717 shown in page?
Upvotes: 6
Views: 3703
Reputation: 337
The first thing you need to do is scroll down the video page to load all comments:
$actualHeight = 0;
$nextHeight = 0;
while (true) {
try {
$nextHeight += 10;
$actualHeight = $this->driver->executeScript('return document.documentElement.scrollHeight;');
if ($nextHeight >= ($actualHeight - 50 ) ) break;
$this->driver->executeScript("window.scrollTo(0, $nextHeight);");
$this->driver->manage()->timeouts()->implicitlyWait = 10;
} catch (Exception $e) {
break;
}
}
Upvotes: 0
Reputation: 193088
There are a couple of things:
https://www.youtube.com/watch?v=N0lxfilGfak
the comments doesn't render unless user scrolls the following element within the Viewport.The comments are with in:
<!--css-build:shady-->
Which applies, Polymer CSS Builder is used apply Polymer's CSS Mixin shim and ShadyDOM scoping. So some runtime work is still done to convert CSS selectors under the default settings.
Considering the above mentioned factors here's a solution to retrieve all the comments:
Code Block:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException, ElementClickInterceptedException, WebDriverException
import time
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.youtube.com/watch?v=N0lxfilGfak')
driver.execute_script("return scrollBy(0, 400);")
subscribe = WebDriverWait(driver, 60).until(EC.visibility_of_element_located((By.XPATH, "//yt-formatted-string[text()='Subscribe']")))
driver.execute_script("arguments[0].scrollIntoView(true);",subscribe)
comments = []
my_length = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//yt-formatted-string[@class='style-scope ytd-comment-renderer' and @id='content-text'][@slot='content']"))))
while True:
try:
driver.execute_script("window.scrollBy(0,800)")
time.sleep(5)
comments.append([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//yt-formatted-string[@class='style-scope ytd-comment-renderer' and @id='content-text'][@slot='content']")))])
except TimeoutException:
driver.quit()
break
print(comment)
Upvotes: 5
Reputation: 1252
I'm not familiar with python, but I'll tell you the steps that I would do to get all comments. First of all, if your code I think the main issue is with the
CYCLES = 7
According to this, you will be scrolling for 2 seconds 7 times. Since you are successfully grabbing 60 comments, fixing the above condition will solve your issue.
I assume you don't have any issue in finding elements on a website using locators.
You need to get the total comments to count to a variable as an int. (in your case, let's say it's COMMENTS = 715)
Define another variable called VISIBLECOUNTS = 0
The use a while loop to scroll if the COMMENTS > VISIBLECOUNTS
The code might look like this ( really sorry if there are syntax issues )
// python - selenium command to get all comments counts.
COMMENTS = 715
(715 is just a sample value, it will change upon the total comments count)
VISIBLECOUNTE = 0
SCROLL_PAUSE_TIME = 2
while VISIBLECOUNTS < COMMENTS :
html.send_keys(Keys.END)
time.sleep(SCROLL_PAUSE_TIME)
VISIBLECOUNTS = len(driver.find_elements_by_xpath('//ytm-comment-thread-renderer'))
With this, you will be scrolling down until the COMMENTS = VISIBLECOUNTS. Then you can grab all the comments as all of them share the same element attributes such as ytm-comment-thread-renderer
Since I'm not familiar with python I'll add the command to get the comments to count from js. you can try this on your browser and convert it into your python command
Run the bellow queries in your console and check.
To get total comments count var comments = document.querySelector(".comment-section-header-text").innerText.split(" ") //We can get the text value "Comments • 715" and split by spaces and get the last value Number(comments[comments.length -1]) //Then convirt string "715" to int, you just need to do these in python - selenium To get active comments count $x("//ytm-comment-thread-renderer").length
Note: if it's hard to extract the values you still can use the selenium js executor and do the scrolling with js until all the comments are visible. But I guess it's not hard to do it in python since the logic is the same.
I'm really sorry about not being able to add the solution in python. But hope this helped. cheers.
Upvotes: 4
Reputation: 106
If you don't have to use Selenium I would recommend you to look at the google/youtube api.
https://developers.google.com/youtube/v3/getting-started
Example :
This would give you the first 100 results and gets you a token that you can append on the next request to get the next 100 results.
Upvotes: 4
Reputation: 14135
You are getting a limited number of comments as YouTube will load the comments as you keep scrolling down. There are around 394 comments left on that video you have to first make sure all the comments are loaded and then also expand all View Replies
so that you will reach the max comments count.
Note: I was able to get 700 comments using the below lines of code.
# get the last comment
lastEle = driver.find_element_by_xpath("(//*[@id='content-text'])[last()]")
# scroll to the last comment currently loaded
lastEle.location_once_scrolled_into_view
# wait until the comments loading is done
WebDriverWait(driver,30).until(EC.invisibility_of_element((By.CSS_SELECTOR,"div.active.style-scope.paper-spinner")))
# load all comments
while lastEle != driver.find_element_by_xpath("(//*[@id='content-text'])[last()]"):
lastEle = driver.find_element_by_xpath("(//*[@id='content-text'])[last()]")
driver.find_element_by_xpath("(//*[@id='content-text'])[last()]").location_once_scrolled_into_view
time.sleep(2)
WebDriverWait(driver,30).until(EC.invisibility_of_element((By.CSS_SELECTOR,"div.active.style-scope.paper-spinner")))
# open all replies
for reply in driver.find_elements_by_xpath("//*[@id='replies']//paper-button[@class='style-scope ytd-button-renderer'][contains(.,'View')]"):
reply.location_once_scrolled_into_view
driver.execute_script("arguments[0].click()",reply)
time.sleep(5)
WebDriverWait(driver, 30).until(
EC.invisibility_of_element((By.CSS_SELECTOR, "div.active.style-scope.paper-spinner")))
# print the total number of comments
print(len(driver.find_elements_by_xpath("//*[@id='content-text']")))
Upvotes: 7