Reputation: 2325
I parse a website with python. They use a lot of redirects and they do them by calling javascript functions.
So when I just use urllib to parse the site, it doesn't help me, because I can't find the destination url in the returned html code.
Is there a way to access the DOM and call the correct javascript function from my python code?
All I need is the url, where the redirect takes me.
Upvotes: 3
Views: 5504
Reputation: 2325
I looked into Selenium. And if you are not running a pure script (meaning you don't have a display and can't start a "normal" browser) the solution is actually quite simple:
from selenium import webdriver
driver = webdriver.Firefox()
link = "http://yourlink.com"
driver.get(link)
#this waits for the new page to load
while(link == driver.current_url):
time.sleep(1)
redirected_url = driver.current_url
For my usecase this is more than enough. Selenium can also interact with forms and send keystrokes to the website.
Upvotes: 10
Reputation: 14959
It doesnt sound like fun to me, but every javascript function is a is also an object, so you can just read the function rather than call it and perhaps the URL is in it. Otherwise, that function may call another which you would then have to recurse into... Again, doesnt sound like fun, but might be doable.
Upvotes: -1