How to __scrape__ data off page loaded via Javascript

Question

I want to scrape the comments off this page using beautifulsoup - https://www.x....s.com/video_id/the-suburl

The comments are loaded on click via Javascript. The comments are paginated and each page loads comments on click too. I wish to fetch all comments, for each comment, I want to get the poster profile url, the comment, no. of likes, no of dislikes, and time posted (as stated on the page).

The comments can be a list of dictionaries.

How do I go about this?

Andrej Kesely · Accepted Answer

This script will print all comments found on the page:

import json
import requests
from bs4 import BeautifulSoup


url = 'https://www.x......com/video_id/gggjggjj/'
video_id = url.rsplit('/', maxsplit=2)[-2].replace('video', '')

u = 'https://www.x......com/threads/video/ggggjggl/{video_id}/0/0'.format(video_id=video_id)
comments = requests.post(u, data={'load_all':1}).json()

for id_ in comments['posts']['ids']:
    print(comments['posts']['posts'][id_]['date'])
    print(comments['posts']['posts'][id_]['name'])
    print(comments['posts']['posts'][id_]['url'])
    print(BeautifulSoup(comments['posts']['posts'][id_]['message'], 'html.parser').get_text())
    # ...etc.
    print('-'*80)

How to scrape data off page loaded via Javascript

Answers (2)

Related Questions