Jebiel
Jebiel

Reputation: 52

How to use requests in Python 3 to fetch data from website that utilizes JavaScript and jQuery

I have been playing around with the requests library in Python 3 for quite some time now, and have decided to create a test program. For this program, I'm using the website https://ytmp3.cc/ as an example. But it turns out that a lot is going on, on the client-side it seems.

Some keys and other stuff are being generated, and I have been using Firefox's built-in network monitor, to figure out in which requests this is being made, but without luck.

As far as I know, the requests-library can't keep a "page" open and modify the DOM and content, by making more requests.

Anyone whom could take a look, and give a qualified guess on how the special keys are generated, and how I could possibly get these for my own requests.

Fx when loading the webpage, the first request made is for the root, and the response contains the webpage HTML. What I noticed is that at the bottom, there's an url containing some key and number.

<script id="cs" src="js/converter-1.0.js?o=7_1a-a&=_1519520467"></script>
id      7_1a-a
number  _1519520467`

This is used for making the next request, but then a lot of following requests are being made, and some other keys are made as well. But I can't find where these come from since they are not returned by a request.

I know that when inserting a Youtube link, a request will be made to an url, as seen below.

https://d.ymcdn.cc/check.php?callback=jQuery33107639361236859977_1519520481166&v=eVD9j36Ke94&f=mp3&k=7_1a-a&_=1519520481168

This returns the following:

jQuery33107639361236859977_1519520481166({"sid":"21","hash":"2a6b2475b059101480f7f16f2dde67ac","title":"M\u00d8 - Kamikaze (Official Video)","ce":1,"error":""})

From this I can construct the download url, using the hash from above:

https://yyd.ymcdn.cc/ + 2a6b2475b059101480f7f16f2dde67ac (hash) + /eVD9j36Ke94 (youtube video id)

But how do I get

jQuery33107639361236859977_1519520481166&v=eVD9j36Ke94 and 1519520481168

Which I need to create the request?

Upvotes: 0

Views: 500

Answers (1)

Blender
Blender

Reputation: 298136

You can probably save yourself and the operator of that website a lot of headache by just using youtube-dl, specifically with the --extract-audio --audio-format mp3 options. It's probably what that website itself uses.

youtube-dl is written in Python and can easily be used programatically.

If you insist on sending requests to that website for whatever reason, here's how I'd do it:

  • callback=jQuery33107639361236859977_1519520481166 specifies the name of the callback for the JSONP request. Any name you provide will be printed back out. For example, passing callback=foo will result in the following response:

    foo({...})
    

    You can omit it entirely and the server will serve just a JSON response in this case, which is nice.

  • _=1519520481168 is just to prevent the response being cached. It's randomly generated, just like the above parameter. The website checks for existence, however, so you have to at least pass something in.

  • The website, like many, checks for a valid Referer header.

Here's a minimal cURL command line to make a request to that website:

curl 'https://d.ymcdn.cc/check.php?v=eVD9j36Ke94&f=mp3&k=aZa4__&_=1' -H 'Referer: https://ytmp3.cc/'

Upvotes: 1

Related Questions