Kai
Kai

Reputation: 2325

Getting the final destination of a javascript redirect on a website

I parse a website with python. They use a lot of redirects and they do them by calling javascript functions.

So when I just use urllib to parse the site, it doesn't help me, because I can't find the destination url in the returned html code.

Is there a way to access the DOM and call the correct javascript function from my python code?

All I need is the url, where the redirect takes me.

Upvotes: 3

Views: 5504

Answers (2)

Kai
Kai

Reputation: 2325

I looked into Selenium. And if you are not running a pure script (meaning you don't have a display and can't start a "normal" browser) the solution is actually quite simple:

from selenium import webdriver

driver = webdriver.Firefox()
link = "http://yourlink.com"
driver.get(link)

#this waits for the new page to load
while(link == driver.current_url):
  time.sleep(1)

redirected_url = driver.current_url

For my usecase this is more than enough. Selenium can also interact with forms and send keystrokes to the website.

Upvotes: 10

Lucas
Lucas

Reputation: 14959

It doesnt sound like fun to me, but every javascript function is a is also an object, so you can just read the function rather than call it and perhaps the URL is in it. Otherwise, that function may call another which you would then have to recurse into... Again, doesnt sound like fun, but might be doable.

Upvotes: -1

Related Questions