Reputation: 31
I'm trying to do some webscraping, and it involves sending a form with a Multiple Select Box List that looks similar to this:
<select name="multipleSelectForm" multiple="multiple" size="5">
<option value="value1">value1</option>
<option value="value2">value2</option>
</select>
Now, I want to send both value1 and value2 using pycurl, for example:
import urllib
import pycurl
c = pycurl.Curl()
data = {'multipleSelectForm':'value1',
'multipleSelectForm':'value2'}
c.setopt(c.URL, 'http://www.example.com')
c.setopt(c.POST, 1)
post = urllib.urlencode(data)
c.setopt(c.POSTFIELDS, post)
c.perform()
now the obvious problem with this is that it's sending multipleSelectForm multiple times. I'm quite sure that the requested page is probably looking for a multipleSelectForm array, and not just individual variables (this is just a guess, I'm not actually sure) and hence that POST data it receives isn't correct.
I tried using Google Chrome's dev tools to see the traffic of what it's doing and when I looked at the Form Data, it looked like this:
multipleSelectForm:value1
multipleSelectForm:value2
I'm a bit lost on how to approach all of this, if anybody would care to help
Upvotes: 3
Views: 4610
Reputation: 179
From that it looks like the data you're sending is going to just be
{ 'multipleSelectForm':'value2' }
Because it's a dictionary. If you set it up as tuple pairs it will do what you want.
data = (('multipleSelectForm', 'value1'), ('multipleSelectForm', 'value2'))
You can test this your self by setting up a tiny debug http server:
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
class hand(BaseHTTPRequestHandler):
def __init__(self, socket, *args):
print socket.recv(10000)
server = HTTPServer(('', 8080), hand)
server.serve_forever()
and then hitting it with your script. I used this to confirm that passing the tuple list does what I expect.
Upvotes: 1