How to get specific data using BeautifulSoup

Question

I'm not sure how to get a specific result from this:

How would I get the src in

This is what I've tried so far:

import urllib.request
from bs4 import BeautifulSoup

url = "https://someurlhere"

a = urllib.request.Request(url, headers={'User-Agent' : "Cliqz"})
b = urllib.request.urlopen(a) # prevent "Permission denies"

soup = BeautifulSoup(b, 'html.parser')

for video_class in soup.select("div.videoPlayer"):
    print(video_class.text)

Which returns parts of it but not down to video class

Simas Joneliunas · Accepted Answer

Requests is a simple html client, it cannot execute javascripts.

You have three more options to try here though!

try going over the html source (b) and see if any of the javascripts in the site have the data you need. usually, the page would have the url (which, i assume you want to scrape) in some sort of holder (a javascript code or a json object) that you can scrape off.
Try looking at the XHR requests of the site and see if any of the requests query external sources for the video data. In this case, see if you can imitate that request to get the data you need.
(last resort) You need to use a phantomjs + selenium browser to download the website (Link1, Link2). You can find out more about how to use selenium in this SO post: https://stackoverflow.com/a/26440563/3986395

How to get specific data using BeautifulSoup

Answers (1)

Related Questions