Saw
Saw

Reputation: 135

Why is youtube not giving me video links and only giving me their resource pages?

For some reason youtube only gives me their resource pages instead of giving me their video links which is what I am looking for.

code:

import mechanize
import urllib
import urllib.parse as urlparse

url = "https://www.youtube.com"
browser = mechanize.Browser()

browser.open(url)

for link in browser.links():
    new_url = urlparse.urljoin(link.base_url,link.url)
    b1 = urlparse.urlparse(new_url).hostname
    b2 = urlparse.urlparse(new_url).path
    print('http://'+b1+b2)

output I got:

http://accounts.google.com/ServiceLogin
<http://www.youtube.com/
http://www.youtube.com/

output I expected:

https://www.youtube.com/watch?v=uVvZlH5gPA
https://www.youtube.com/watch?v=uVvasdad5

Upvotes: 3

Views: 78

Answers (1)

Cornelius Roemer
Cornelius Roemer

Reputation: 7949

tldr: YouTube doesn't want you to scrape it

Before I'd start scraping links, I'd first check what website you get returned in the first place. YouTube does all sorts of things to prevent scraping from being easy. If you're new to scraping and the library, I'd suggest you start learning from simple examples first.

I ran your code, saving the response as html and looked at it in the browser. It's clearly not the standard website you expected.

# example.py
import mechanize

print(mechanize.Browser.open(https://www.youtube.com).read())
python3 example.py > example.html

This is what I get when I open example.html with a browser:

enter image description here

If you need to scrape YouTube, there are certainly many good tutorials on the internet. You may need to use a different library, such as Selenium, or set your cookies in a way that hides from YouTube that you're not a normal user but a Python script.

Upvotes: 1

Related Questions