sangeetha
sangeetha

Reputation: 53

Scraping youtube all the comments and its replies with selenium in python

I'm trying to scrape the youtube video comments and its replies, comment likes, comment dislikes, comment count, reply count.

First I tried to scrape the text data like comments and its reply with selenium google drivers in python based on id.

I can able to scrape only the comments which is available in the page and not its replies.

Replies are not able to achieve.

import time
import csv
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_path = "/Users/Downloads/chromedriver"
page_url = "https://www.youtube.com/watch?v=AJesAlohO6I&t=" 


driver = webdriver.Chrome(executable_path=chrome_path)
driver.get(page_url)
time.sleep(2)  


title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
print(title)


SCROLL_PAUSE_TIME = 2
CYCLES = 100

html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)  
html.send_keys(Keys.PAGE_DOWN)  
time.sleep(SCROLL_PAUSE_TIME * 3)

for i in range(CYCLES):
    html.send_keys(Keys.END)
    time.sleep(SCROLL_PAUSE_TIME)


comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
print(all_comments)

write_file = "output_testing.csv"
with open(write_file, "w") as output:
    for line in all_comments:
        output.write(line + '\n')

With this above code I can able to scrape only the comments. How to scrape reply of those comments, likes, dislikes, date of those comments with selenium in python.

Can anyone please help me to suggest where I'm went wrong.

Updated Code ( Empty Array )

import time
import csv
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_path = "/Users/Downloads/chromedriver"
page_url = "https://www.youtube.com/watch?v=qBp1rCz_yQU" 


driver = webdriver.Chrome(executable_path=chrome_path)
driver.get(page_url)
time.sleep(2)  


title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
print(title)


SCROLL_PAUSE_TIME = 2
CYCLES = 100

html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)  
html.send_keys(Keys.PAGE_DOWN)  
time.sleep(SCROLL_PAUSE_TIME * 3)

for i in range(CYCLES):
    html.send_keys(Keys.END)
    time.sleep(SCROLL_PAUSE_TIME)

driver.find_elements_by_xpath('//div[@id="replies"]/ytd-comment-replies-renderer/ytd-expander/paper-button[@id="more"]')

comment_elems = driver.find_elements_by_xpath('//div[@id="loaded-replies"]//yt-formatted-string[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
print(all_comments)

write_file = "output_31may.csv"
with open(write_file, "w") as output:
    for line in all_comments:
        output.write(line + '\n')

My updated code: (1-05-2019)

import time
import csv
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

chrome_path = "/Users/Downloads/chromedriver"
page_url = "https://www.youtube.com/watch?v=qBp1rCz_yQU" 


driver = webdriver.Chrome(executable_path=chrome_path)
driver.get(page_url)
time.sleep(2)  


title = driver.find_element_by_xpath('//*[@id="container"]/h1/yt-formatted-string').text
print(title)


SCROLL_PAUSE_TIME = 2
CYCLES = 100

html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.PAGE_DOWN)  
html.send_keys(Keys.PAGE_DOWN)  
time.sleep(SCROLL_PAUSE_TIME * 3)

for i in range(CYCLES):
    html.send_keys(Keys.END)
    time.sleep(SCROLL_PAUSE_TIME)


comment_elems = driver.find_elements_by_xpath('//*[@id="content-text"]')
all_comments = [elem.text for elem in comment_elems]
#print(all_comments)

replies_elems =driver.find_elements_by_xpath('//*[@id="replies"]')
all_replies = [elem.text for elem in replies_elems]
print(all_replies)

write_file = "output_replies.csv"
with open(write_file, "w") as output:
    for line in all_replies:
        output.write(line + '\n')

My actual ouput:

['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 39 replies', '', '', 'View 2 replies', '', '', '', 'View reply', '', '', '', '', '', 'View reply', '', '', '', '', '', '', '', '', 'View reply', '', '', 'View reply', '', '', '', '', 'View 43 replies', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 2 replies', '', '', '', '', '', 'View 17 replies', '', '', '', '', 'View 13 replies', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View reply', '', 'View reply', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 5 replies', '', '', '', '', '', 'View reply', '', 'View 28 replies', '', '', 'View 27 replies', '', '', 'View reply', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View reply', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 9 replies', 'View reply', '', '', '', 'View reply', '', 'View 13 replies', '', '', '', 'View reply', 'View 9 replies', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'View 11 replies', '', '', '', '', 'View 2 replies', '', '', '', '', '', 'View reply', '', '', '', '', '', '', 'View reply', '', '', '', '', '', '', '', 'View reply', '', '', '', 'View 2 replies', '', '', '', '']

My expected output to get the replies content message. But I can able to get only for reply count.

Upvotes: 2

Views: 3516

Answers (1)

Adarsh Patel
Adarsh Patel

Reputation: 622

You need to click on View replay to scrape comment replies.

for clicking on that, you can do following:

driver.find_elements_by_xpath("//ytd-button-renderer[@id='more-replies']/a/paper-button[@id="button"]").click()

and after that for scraping replies

driver.find_elements_by_xpath("//div[@id='loaded-replies']/ytd-comment-renderer//yt-formatted-string[@id='content-text']") 

Upvotes: 1

Related Questions