Steve Baskauf
Steve Baskauf

Reputation: 21

Why doesn't a SPARQL POST query to the Wikidata SPARQL endpoint work with the Python urllib3 library when the corresponding requests query does?

I want to make the SPARQL query to the Wikidata Query Service using query via URL-encoded POST (method 2) rather than GET (method 1) since GET queries have a limited length and some queries may be long if a lot of VALUES data are sent. Based on my past experience using query via POST directly (method 3), it has problems with character encoding at the Wikidata Query Service. The three methods for performing SPARQL queries via HTTP are described in the W3C SPARQL 1.1 specification.

I want to use the Python urllib3 library rather than requests, since this code will be part of an AWS Lambda and requests is no longer a supported library in the boto3 SDK. I could import requests as a layer, but I would prefer to keep things simple by just using urllib3.

I have been making URL-encoded POST HTTP queries using the requests library for a long time with no problems. However, when I use the analogous code for the urllib3 library, I get an error. I am mystified by this behavior, particularly since the requests library is just a wrapper over urllib3. There must be something that requests is adding to the HTTP request that urllib3 isn't. I've read the docs and looked at examples for making POST requests with urllib3 and can't see anything I'm missing. I tried URL encoding the query (commented out in the code below), but that didn't make any difference.

I queried the Wikidata Query Service SPARQL endpoint using the following Python code and the requests library:

import requests

query_string = 'SELECT ?item WHERE {?item wdt:P31 wd:Q146.}LIMIT 10'

requestheader = {
    'User-Agent': 'TestAgent/0.1 (mailto:[email protected])', 
    'Accept': 'application/sparql-results+json',
    'Content-Type': 'application/x-www-form-urlencoded'
    }
    
response = requests.post('https://query.wikidata.org/sparql', data={'query' : query_string}, headers=requestheader)
print(response.status_code)
print(response.headers)
print(response.text)

As expected, I received the following response from the API:

200
{'server': 'nginx/1.18.0', 'date': 'Thu, 22 Feb 2024 21:18:02 GMT', 'content-type': 'application/sparql-results+json;charset=utf-8', 'x-first-solution-millis': '1', 'x-served-by': 'wdqs1015', 'access-control-allow-origin': '*', 'cache-control': 'public, max-age=300', 'content-encoding': 'gzip', 'vary': 'Accept, Accept-Encoding', 'age': '0', 'x-cache': 'cp1106 miss, cp1106 pass', 'x-cache-status': 'pass', 'server-timing': 'cache;desc="pass", host;desc="cp1106"', 'strict-transport-security': 'max-age=106384710; includeSubDomains; preload', 'report-to': '{ "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }', 'nel': '{ "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}', 'x-client-ip': '129.59.122.76', 'accept-ranges': 'bytes', 'content-length': '217'}
{
  "head" : {
    "vars" : [ "item" ]
  },
  "results" : {
    "bindings" : [ {
      "item" : {
        "type" : "uri",
        "value" : "http://www.wikidata.org/entity/Q378619"
      }
    }, {
      "item" : {
        "type" : "uri",
        "value" : "http://www.wikidata.org/entity/Q498787"
      }
    }, {
      "item" : {
        "type" : "uri",
        "value" : "http://www.wikidata.org/entity/Q677525"
      }
    }, {
      "item" : {
        "type" : "uri",
...
      }
    } ]
  }
}

However, when I make the analogous request using the urllib3 library I get an error. Code:

import urllib3
#import urllib.parse

query_string = 'SELECT ?item WHERE {?item wdt:P31 wd:Q146.}LIMIT 10'
# Try url encoding the query string. I think this isn't necessary because I think urllib3 already does this.
#query_string = urllib.parse.quote(query_string)
#print(query_string)

http = urllib3.PoolManager()

requestheader = {
'User-Agent': 'TestAgent/0.1 (mailto:[email protected])', 
'Accept': 'application/sparql-results+json',
'Content-Type': 'application/x-www-form-urlencoded'
}

response = http.request('POST', 'https://query.wikidata.org/sparql', fields={'query' : query_string}, headers=requestheader)
print(response.status)
print(response.headers)
print(response.data.decode('utf-8'))

Response:

405
HTTPHeaderDict({'server': 'nginx/1.18.0', 'date': 'Mon, 26 Feb 2024 13:17:45 GMT', 'content-type': 'text/plain;charset=iso-8859-1', 'x-served-by': 'wdqs1018', 'access-control-allow-origin': '*', 'vary': 'Accept-Encoding', 'age': '0', 'x-cache': 'cp1108 miss, cp1108 pass', 'x-cache-status': 'pass', 'server-timing': 'cache;desc="pass", host;desc="cp1108"', 'strict-transport-security': 'max-age=106384710; includeSubDomains; preload', 'report-to': '{ "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }', 'nel': '{ "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}', 'x-client-ip': '166.194.158.40', 'content-length': '13'})
Not writable.

I cannot see any problems with the urllib3 request. The Wikdiata Query Service is a public API and no authentication is required.

Upvotes: 2

Views: 144

Answers (0)

Related Questions