Reputation: 13316
I want to crawl bets of bookmakers directly from their webpages. Currently I try to get the quotes from a provider called unibet.com. The problem: I need to send a post request in order to get an appropriate filtering of the quotes I want.
Therefore I go to the following webpage https://www.unibet.com/betting/grid/all-football/germany/bundesliga/1000094994.odds# where in the upper part of the bets section are several checkboxes. I uncheck every box instead of "Match". Then I click on the update Button and recorded the post request with chrome. The following screenshot demonstrates what is being sent:
After that I get a filtered result that only contains the quotes for a match.
Now, I just want to have these quotes. Therefore I wrote the following python code:
req = urllib2.Request( 'https://www.unibet.com/betting/grid/grid.do?eventGroupIds=1000094994' )
req.add_header("Content-type", "application/x-www-form-urlencoded")
post_data = [ ('format','iframe'),
('filtered','true'),
('gridSelectedTab','1'),
('_betOfferCategoryTab.filterOptions[1_604139].checked','true'),
('betOfferCategoryTab.filterOptions[1_604139].checked','on'),
('_betOfferCategoryTab.filterOptions[1_611318].checked','false'),
('_betOfferCategoryTab.filterOptions[1_611319].checked','false'),
('_betOfferCategoryTab.filterOptions[1_611321].checked','false'),
('_betOfferCategoryTab.filterOptions[1_604144].checked','false'),
('_betOfferCategoryTab.filterOptions[1_624677].checked','false'),
('_betOfferCategoryTab.filterOptions[1_604142].checked','false'),
('_betOfferCategoryTab.filterOptions[1_604145].checked','false'),
('_betOfferCategoryTab.filterOptions[1_611322].checked','false'),
('_betOfferCategoryTab.filterOptions[1_604148].checked','false'),
('gridSelectedTimeframe','')]
post_data = urllib.urlencode(post_data)
req.add_header('Content-Length', len(post_data ))
resp = urllib2.urlopen(req, post_data )
html = resp.read()
The problem: Instead of a filtered result I get the full list of all quotes and bet types as if all checkboxes had been checked. I do not understand why my python request returns the unfiltered data?
Upvotes: 1
Views: 880
Reputation: 7233
The site stores your preferences in a session cookie. Because you're not capturing and sending the appropriate cookie, upon updating the site presents its default results.
Try this:
import cookielib
cookiejar = cookielib.CookieJar()
opener = urllib2.build_opener(
urllib2.HTTPRedirectHandler(),
urllib2.HTTPHandler(debuglevel=0),
urllib2.HTTPSHandler(debuglevel=0),
urllib2.HTTPCookieProcessor(cookiejar),
)
Now, instead of using urllib2.open()
just call opener as a function call: opener()
and pass your args.
Upvotes: 1