Reputation: 61
I'm trying to send a POST-request to this site to perform a search in their database, but it just returns the same page, not the page I'm looking for. If I try to access the site the search leads to with my parameters, it denies my request. I feel like I'm missing something, maybe someone can help me out.
import urllib.request
from bs4 import BeautifulSoup
DATA = urllib.parse.urlencode({'plz_ff': 50000, 'plz_ff2': 50030})
DATA = DATA.encode('utf-8')
request = urllib.request.Request("http://www.altenheim-adressen.de/schnellsuche/index.cfm", 'POST')
# adding charset parameter to the Content-Type header.
request.add_header("text/html;charset=UTF-8","application/x-www-form-urlencoded;charset=utf-8Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0")
f = urllib.request.urlopen(request, DATA)
soup = BeautifulSoup(f)
print(soup.prettify)
f.close()
soup.close
EDIT2: The Code works now, many thanks. I had to adjust the search parameters to adress suche1.cfm
, and it now returns the result I'm looking for. This is the finished product:
import urllib.request
from bs4 import BeautifulSoup
params = {
'name_ff': '',
'strasse_ff': '',
'plz_ff': 50000,
'plz_ff2': 50030,
'ort_ff': '',
'bundesland_ff': '',
'land_ff': '',
'traeger_ff': '',
'Dachverband_ff': '',
'submit2' : 'Suchen'
}
DATA = urllib.parse.urlencode(params)
DATA = DATA.encode('utf-8')
request = urllib.request.Request(
"http://www.altenheim-adressen.de/schnellsuche/suche1.cfm",
DATA)
# adding charset parameter to the Content-Type header.
request.add_header("Content-Type", "application/x-www-form-urlencoded;charset=utf-8")
request.add_header("User-Agent", "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0")
f = urllib.request.urlopen(request)
soup = BeautifulSoup(f)
print(soup.prettify)
f.close()
Upvotes: 1
Views: 718
Reputation: 1123350
You need to add DATA
to the Request
object; you are sending the text 'POST'
as a post body instead now.
The correct method would be:
request = urllib.request.Request(
"http://www.altenheim-adressen.de/schnellsuche/index.cfm",
DATA)
# adding charset parameter to the Content-Type header.
request.add_header("Content-Type", "application/x-www-form-urlencoded;charset=utf-8")
request.add_header("User-Agent", "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0")
f = urllib.request.urlopen(request)
Note that I also added User-Agent and Content-Type headers there; whatever you were adding was not a recognisable header.
I do note that the 2 forms on this page use different targets; neither posts to index.cfm
:
>>> for form in soup.find_all('form'):
... print(form.attrs.get('action'))
...
suche1.cfm
suche1b.cfm
so if you expected to use one of these forms you'll need to use the correct target URL here. You'd also need to verify your POST form fields; I see that the form posting to suche1b.cfm
has similar form fields, but they use plz_ffb
and plz_ff2b
, not plz_ff
and plz_ffb
.
You'll likely also want to send all form fields, even with empty strings:
params = {
'name_ffb': '',
'ort_ffb': '',
'strasse_ffb': '',
'plz_ffb': '50000',
'plz_ff2b': '50030',
'land_ff': '',
'rubrik_ff': '',
'submit22': 'Suchen'
}
DATA = urllib.parse.urlencode(params)
DATA = DATA.encode('utf-8')
Upvotes: 1