toom
toom

Reputation: 13316

urllib2 sending postdata

I want to crawl bets of bookmakers directly from their webpages. Currently I try to get the quotes from a provider called unibet.com. The problem: I need to send a post request in order to get an appropriate filtering of the quotes I want.

Therefore I go to the following webpage https://www.unibet.com/betting/grid/all-football/germany/bundesliga/1000094994.odds# where in the upper part of the bets section are several checkboxes. I uncheck every box instead of "Match". Then I click on the update Button and recorded the post request with chrome. The following screenshot demonstrates what is being sent:

enter image description here

After that I get a filtered result that only contains the quotes for a match.

Now, I just want to have these quotes. Therefore I wrote the following python code:

    req = urllib2.Request( 'https://www.unibet.com/betting/grid/grid.do?eventGroupIds=1000094994' )
    req.add_header("Content-type", "application/x-www-form-urlencoded")
    post_data = [ ('format','iframe'),
                  ('filtered','true'),
                  ('gridSelectedTab','1'),
                  ('_betOfferCategoryTab.filterOptions[1_604139].checked','true'),
                  ('betOfferCategoryTab.filterOptions[1_604139].checked','on'),
                  ('_betOfferCategoryTab.filterOptions[1_611318].checked','false'),
                  ('_betOfferCategoryTab.filterOptions[1_611319].checked','false'),
                  ('_betOfferCategoryTab.filterOptions[1_611321].checked','false'),
                  ('_betOfferCategoryTab.filterOptions[1_604144].checked','false'),
                  ('_betOfferCategoryTab.filterOptions[1_624677].checked','false'),
                  ('_betOfferCategoryTab.filterOptions[1_604142].checked','false'),
                  ('_betOfferCategoryTab.filterOptions[1_604145].checked','false'),
                  ('_betOfferCategoryTab.filterOptions[1_611322].checked','false'),
                  ('_betOfferCategoryTab.filterOptions[1_604148].checked','false'),
                  ('gridSelectedTimeframe','')]
    post_data = urllib.urlencode(post_data)
    req.add_header('Content-Length', len(post_data ))
    resp = urllib2.urlopen(req, post_data )
    html = resp.read()

The problem: Instead of a filtered result I get the full list of all quotes and bet types as if all checkboxes had been checked. I do not understand why my python request returns the unfiltered data?

Upvotes: 1

Views: 880

Answers (1)

That1Guy
That1Guy

Reputation: 7233

The site stores your preferences in a session cookie. Because you're not capturing and sending the appropriate cookie, upon updating the site presents its default results.

Try this:

import cookielib

cookiejar = cookielib.CookieJar()
opener = urllib2.build_opener(
    urllib2.HTTPRedirectHandler(),
    urllib2.HTTPHandler(debuglevel=0),
    urllib2.HTTPSHandler(debuglevel=0),
    urllib2.HTTPCookieProcessor(cookiejar),
)

Now, instead of using urllib2.open() just call opener as a function call: opener() and pass your args.

Upvotes: 1

Related Questions