Reputation: 21
I want to make the SPARQL query to the Wikidata Query Service using query via URL-encoded POST (method 2) rather than GET (method 1) since GET queries have a limited length and some queries may be long if a lot of VALUES data are sent. Based on my past experience using query via POST directly (method 3), it has problems with character encoding at the Wikidata Query Service. The three methods for performing SPARQL queries via HTTP are described in the W3C SPARQL 1.1 specification.
I want to use the Python urllib3
library rather than requests
, since this code will be part of an AWS Lambda and requests
is no longer a supported library in the boto3
SDK. I could import requests
as a layer, but I would prefer to keep things simple by just using urllib3
.
I have been making URL-encoded POST HTTP queries using the requests
library for a long time with no problems. However, when I use the analogous code for the urllib3
library, I get an error. I am mystified by this behavior, particularly since the requests
library is just a wrapper over urllib3
. There must be something that requests
is adding to the HTTP request that urllib3
isn't. I've read the docs and looked at examples for making POST requests with urllib3
and can't see anything I'm missing. I tried URL encoding the query (commented out in the code below), but that didn't make any difference.
I queried the Wikidata Query Service SPARQL endpoint using the following Python code and the requests library:
import requests
query_string = 'SELECT ?item WHERE {?item wdt:P31 wd:Q146.}LIMIT 10'
requestheader = {
'User-Agent': 'TestAgent/0.1 (mailto:[email protected])',
'Accept': 'application/sparql-results+json',
'Content-Type': 'application/x-www-form-urlencoded'
}
response = requests.post('https://query.wikidata.org/sparql', data={'query' : query_string}, headers=requestheader)
print(response.status_code)
print(response.headers)
print(response.text)
As expected, I received the following response from the API:
200
{'server': 'nginx/1.18.0', 'date': 'Thu, 22 Feb 2024 21:18:02 GMT', 'content-type': 'application/sparql-results+json;charset=utf-8', 'x-first-solution-millis': '1', 'x-served-by': 'wdqs1015', 'access-control-allow-origin': '*', 'cache-control': 'public, max-age=300', 'content-encoding': 'gzip', 'vary': 'Accept, Accept-Encoding', 'age': '0', 'x-cache': 'cp1106 miss, cp1106 pass', 'x-cache-status': 'pass', 'server-timing': 'cache;desc="pass", host;desc="cp1106"', 'strict-transport-security': 'max-age=106384710; includeSubDomains; preload', 'report-to': '{ "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }', 'nel': '{ "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}', 'x-client-ip': '129.59.122.76', 'accept-ranges': 'bytes', 'content-length': '217'}
{
"head" : {
"vars" : [ "item" ]
},
"results" : {
"bindings" : [ {
"item" : {
"type" : "uri",
"value" : "http://www.wikidata.org/entity/Q378619"
}
}, {
"item" : {
"type" : "uri",
"value" : "http://www.wikidata.org/entity/Q498787"
}
}, {
"item" : {
"type" : "uri",
"value" : "http://www.wikidata.org/entity/Q677525"
}
}, {
"item" : {
"type" : "uri",
...
}
} ]
}
}
However, when I make the analogous request using the urllib3 library I get an error. Code:
import urllib3
#import urllib.parse
query_string = 'SELECT ?item WHERE {?item wdt:P31 wd:Q146.}LIMIT 10'
# Try url encoding the query string. I think this isn't necessary because I think urllib3 already does this.
#query_string = urllib.parse.quote(query_string)
#print(query_string)
http = urllib3.PoolManager()
requestheader = {
'User-Agent': 'TestAgent/0.1 (mailto:[email protected])',
'Accept': 'application/sparql-results+json',
'Content-Type': 'application/x-www-form-urlencoded'
}
response = http.request('POST', 'https://query.wikidata.org/sparql', fields={'query' : query_string}, headers=requestheader)
print(response.status)
print(response.headers)
print(response.data.decode('utf-8'))
Response:
405
HTTPHeaderDict({'server': 'nginx/1.18.0', 'date': 'Mon, 26 Feb 2024 13:17:45 GMT', 'content-type': 'text/plain;charset=iso-8859-1', 'x-served-by': 'wdqs1018', 'access-control-allow-origin': '*', 'vary': 'Accept-Encoding', 'age': '0', 'x-cache': 'cp1108 miss, cp1108 pass', 'x-cache-status': 'pass', 'server-timing': 'cache;desc="pass", host;desc="cp1108"', 'strict-transport-security': 'max-age=106384710; includeSubDomains; preload', 'report-to': '{ "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }', 'nel': '{ "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}', 'x-client-ip': '166.194.158.40', 'content-length': '13'})
Not writable.
I cannot see any problems with the urllib3
request. The Wikdiata Query Service is a public API and no authentication is required.
Upvotes: 2
Views: 144