Reputation: 1447
I want to scrape the comments off this page using beautifulsoup - https://www.x....s.com/video_id/the-suburl
The comments are loaded on click via Javascript. The comments are paginated and each page loads comments on click too. I wish to fetch all comments, for each comment, I want to get the poster profile url, the comment, no. of likes, no of dislikes, and time posted (as stated on the page).
The comments can be a list of dictionaries.
How do I go about this?
Upvotes: 2
Views: 109
Reputation: 195418
This script will print all comments found on the page:
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.x......com/video_id/gggjggjj/'
video_id = url.rsplit('/', maxsplit=2)[-2].replace('video', '')
u = 'https://www.x......com/threads/video/ggggjggl/{video_id}/0/0'.format(video_id=video_id)
comments = requests.post(u, data={'load_all':1}).json()
for id_ in comments['posts']['ids']:
print(comments['posts']['posts'][id_]['date'])
print(comments['posts']['posts'][id_]['name'])
print(comments['posts']['posts'][id_]['url'])
print(BeautifulSoup(comments['posts']['posts'][id_]['message'], 'html.parser').get_text())
# ...etc.
print('-'*80)
Upvotes: 2
Reputation: 858
This would be done with Selenium. Selenium emulates a browser. Depending on your preferences you can use a chrome driver or the Firefox driver which is the geckodriver.
Here is a link on how to install the chrome webdriver: http://jonathansoma.com/lede/foundations-2018/classes/selenium/selenium-windows-install/
Then in your code here is how you would set it up:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
# this part may change depending on where you installed the webdriver.
# You may have to define the path to the driver.
# For me my driver is in C:/bin so I do not need to define the path
chrome_options = Options()
# or '-start maximized' if you want the browser window to open
chrome_options.add_argument('--headless')
driver = webdriver.Chrome(options=chrome_options)
driver.get(your_url)
html = driver.page_source # downloads the html from the driver
Selenium has several functions that you can use to perform certain actions such as click on elements on the page. Once you find an element with selenium you can use the .click() method to interact with the element. Let me know if this helps
Upvotes: 0