Reputation: 135
For some reason youtube only gives me their resource pages instead of giving me their video links which is what I am looking for.
code:
import mechanize
import urllib
import urllib.parse as urlparse
url = "https://www.youtube.com"
browser = mechanize.Browser()
browser.open(url)
for link in browser.links():
new_url = urlparse.urljoin(link.base_url,link.url)
b1 = urlparse.urlparse(new_url).hostname
b2 = urlparse.urlparse(new_url).path
print('http://'+b1+b2)
output I got:
http://accounts.google.com/ServiceLogin
<http://www.youtube.com/
http://www.youtube.com/
output I expected:
https://www.youtube.com/watch?v=uVvZlH5gPA
https://www.youtube.com/watch?v=uVvasdad5
Upvotes: 3
Views: 78
Reputation: 7949
tldr: YouTube doesn't want you to scrape it
Before I'd start scraping links, I'd first check what website you get returned in the first place. YouTube does all sorts of things to prevent scraping from being easy. If you're new to scraping and the library, I'd suggest you start learning from simple examples first.
I ran your code, saving the response as html
and looked at it in the browser. It's clearly not the standard website you expected.
# example.py
import mechanize
print(mechanize.Browser.open(https://www.youtube.com).read())
python3 example.py > example.html
This is what I get when I open example.html
with a browser:
If you need to scrape YouTube, there are certainly many good tutorials on the internet. You may need to use a different library, such as Selenium, or set your cookies in a way that hides from YouTube that you're not a normal user but a Python script.
Upvotes: 1