Fresh Prince
Fresh Prince

Reputation: 61

Trying to send a POST-request with urllib.request returns the same page

I'm trying to send a POST-request to this site to perform a search in their database, but it just returns the same page, not the page I'm looking for. If I try to access the site the search leads to with my parameters, it denies my request. I feel like I'm missing something, maybe someone can help me out.

import urllib.request

from bs4 import BeautifulSoup

DATA = urllib.parse.urlencode({'plz_ff': 50000, 'plz_ff2': 50030})
DATA = DATA.encode('utf-8')

request = urllib.request.Request("http://www.altenheim-adressen.de/schnellsuche/index.cfm", 'POST')
# adding charset parameter to the Content-Type header.
request.add_header("text/html;charset=UTF-8","application/x-www-form-urlencoded;charset=utf-8Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0")
f = urllib.request.urlopen(request, DATA)

soup = BeautifulSoup(f)

print(soup.prettify)

f.close()
soup.close

EDIT2: The Code works now, many thanks. I had to adjust the search parameters to adress suche1.cfm, and it now returns the result I'm looking for. This is the finished product:

import urllib.request

from bs4 import BeautifulSoup

params = {
'name_ff': '',
'strasse_ff': '',
'plz_ff': 50000,
'plz_ff2': 50030,
'ort_ff': '',
'bundesland_ff': '',
'land_ff': '',
'traeger_ff': '',
'Dachverband_ff': '',
'submit2' : 'Suchen'
}

DATA = urllib.parse.urlencode(params)
DATA = DATA.encode('utf-8')

request = urllib.request.Request(
"http://www.altenheim-adressen.de/schnellsuche/suche1.cfm",
DATA)
# adding charset parameter to the Content-Type header.
request.add_header("Content-Type", "application/x-www-form-urlencoded;charset=utf-8")
request.add_header("User-Agent", "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0")
f = urllib.request.urlopen(request)

soup = BeautifulSoup(f)

print(soup.prettify)

f.close()

Upvotes: 1

Views: 718

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1123350

You need to add DATA to the Request object; you are sending the text 'POST' as a post body instead now.

The correct method would be:

request = urllib.request.Request(
    "http://www.altenheim-adressen.de/schnellsuche/index.cfm",
    DATA)
# adding charset parameter to the Content-Type header.
request.add_header("Content-Type", "application/x-www-form-urlencoded;charset=utf-8")
request.add_header("User-Agent", "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0")
f = urllib.request.urlopen(request)

Note that I also added User-Agent and Content-Type headers there; whatever you were adding was not a recognisable header.

I do note that the 2 forms on this page use different targets; neither posts to index.cfm:

>>> for form in soup.find_all('form'):
...     print(form.attrs.get('action'))
... 
suche1.cfm
suche1b.cfm

so if you expected to use one of these forms you'll need to use the correct target URL here. You'd also need to verify your POST form fields; I see that the form posting to suche1b.cfm has similar form fields, but they use plz_ffb and plz_ff2b, not plz_ff and plz_ffb.

You'll likely also want to send all form fields, even with empty strings:

params = {
    'name_ffb': '',
    'ort_ffb': '',
    'strasse_ffb': '',
    'plz_ffb': '50000',
    'plz_ff2b': '50030',
    'land_ff': '',
    'rubrik_ff': '',
    'submit22': 'Suchen'
}
DATA = urllib.parse.urlencode(params)
DATA = DATA.encode('utf-8')

Upvotes: 1

Related Questions