Reputation: 22440
I've written a script in python to scrape some information from a webpage. The site requires get
request method. The issue I'm facing at this moment is that as the parameters
is needed to be merged with url
so it should properly be urlencoded
. This is where I'm stuck. I can't properly encode it to get a valid response. I gave a try but it doesn't bring any
The script I was trying with:
import requests
import urllib.parse
fields ={
'/API/api/v1/Search/Properties/?f':'319 lizzie','ty':'2018','pvty':'2017','pn':'1','st':'9','so':'1','pt':'RP;PP;MH;NR','take':'20','skip':'0','page':'1','pageSize':'20'
}
payload = urllib.parse.quote_plus(fields, safe='', encoding=None, errors=None)
headers={
"User-Agent":"Mozilla/5.0"
}
page = requests.get("http://search.wcad.org/Proxy/APIProxy.ashx?", params=payload, headers=headers)
print(page.json())
The above URL should look like this:
http://search.wcad.org/Proxy/APIProxy.ashx?/API/api/v1/Search/Properties/?f=319%20LIZZIE&ty=2018&pvty=2017&pn=1&st=9&so=1&pt=RP%3BPP%3BMH%3BNR&take=20&skip=0&page=1&pageSize=20
to get the response.
Btw, this is the error I'm having with my existing script:
Traceback (most recent call last):
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\Social.py", line 9, in <module>
payload = urllib.parse.quote_plus(fields, safe='', encoding=None, errors=None)
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\lib\urllib\parse.py", line 728, in quote_plus
string = quote(string, safe + space, encoding, errors)
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\lib\urllib\parse.py", line 712, in quote
return quote_from_bytes(string, safe)
File "C:\Users\ar\AppData\Local\Programs\Python\Python35-32\lib\urllib\parse.py", line 737, in quote_from_bytes
raise TypeError("quote_from_bytes() expected bytes")
TypeError: quote_from_bytes() expected bytes
Upvotes: 0
Views: 351
Reputation: 338228
This works. As the documentation indicates, there is no need to do any URL encoding yourself.
The point is that the query string begins at the last question mark, not at the first. Including the second question mark in the URL is mandatory, as requests
does only adds one when there isn't one there already.
import requests
url = "http://search.wcad.org/Proxy/APIProxy.ashx?/API/api/v1/Search/Properties/?"
params = {'f':'319 lizzie','ty':'2018','pvty':'2017','pn':'1','st':'9','so':'1','pt':'RP;PP;MH;NR','take':'20','skip':'0','page':'1','pageSize':'20'}
response = requests.get(url, params)
response.json()
results in
{ 'ResultList': [{ 'PropertyQuickRefID': 'R016698', 'PartyQuickRefID': 'O0485204', 'OwnerQuickRefID': 'R016698', 'LegacyID': None, 'PropertyNumber': 'R-13-0410-0620-50000', 'OwnerName': 'GOOCH, PHILIP L', 'SitusAddress': '319 LIZZIE ST, TAYLOR, TX 76574', 'PropertyValue': 46785.0, 'LegalDescription': 'DOAK ADDITION, BLOCK 62, LOT 5', 'NeighborhoodCode': 'T541', 'Abstract': None, 'Subdivision': 'S3564 - Doak Addition', 'PropertyType': 'Real', 'ID': 0, 'Text': None, 'TaxYear': 2018, 'PropertyValueTaxYear': 2017 }], 'HasMoreData': False, 'TotalPageCount': 1, 'CurrentPage': 1, 'RecordCount': 1, 'SearchText': '319 lizzie', 'PagingHandledByCaller': False, 'TaxYear': 2018, 'PropertyValueTaxYear': 0 }
Upvotes: 1