Reputation: 716
I'm trying to parse some web page containing a flash player and receive the page HTML using urllib2.
This web page is using jwplayer and the data I need to get out of the web page is within the Flash Object tag. looks something like this:
<object width="100%" height="100%" type="application/x-shockwave-flash" data="https://salsalessons.tv/wp-content/themes/bstrap/js/jwplayer/player.swf" bgcolor="#000000" id="jwplayer-1" name="jwplayer-1" tabindex="0">
<param name="allowfullscreen" value="true">
<param name="allowscriptaccess" value="always">
<param name="seamlesstabbing" value="true">
<param name="wmode" value="opaque">
<param name="flashvars" value="SomeValues">
</object>
and the data I need is the value of one of these param tags. The problem is that urllib2 downloads the page as if it doesn't have flash installes, Getting this code instead where the above should've come:
<div id="jwplayer-1">
<a href="http://get.adobe.com/flashplayer/">Get Adobe Flash Player</a> to watch this video.
</div>
What can I do so that urllib2 would download the page as if it has Flash Player installed?
Thanks.
Upvotes: 0
Views: 2616
Reputation: 5874
It's not that it doesn't have flash installed, it's the jwplayer javascript isn't running and replacing that div with the player. Turn Javascript off in your browser and you'll get the same result.
You'll need to mimic a browser - Selenium is one option, although a quick search around SO turned up a few other ones.
Python Scraper for Javascript?
Scraping javascript-generated data using Python
Upvotes: 2