Reputation: 23
I'm trying to use SPARQL to query literals that have regexes with balanced parentheses. So "( (1) ((2)) (((3))) 4)" should be returned, but "( (1) ((2)) (((3)) 4)", where I removed a closing parenthesis after the "3", should not be returned.
I've previously looked here for a suitable regex: Regular expression to match balanced parentheses
And have been trying to implement regex suggested by rogal111, which is as follows:
\(([^()]|(?R))*\)
This regex follows the PCRE syntax, which I understand is the W3C standard and should be followed by SPARQL. According to the linked example http://regex101.com/r/lF0fI1/1 this should work for the examples above.
I've tested this on both a Jena based triple store, and a Virtuoso based triple store.
Jena: when I try to implement it for SPARQL with the query below, it says that the (?R) inline modifier is unknown.
SELECT ?lf
WHERE
{
BIND("(test)" AS ?l)
FILTER REGEX(?l, "\\(([^()]|(?R))*\\)").
}
The complete error message that is returned is below.
Regex pattern exception: java.util.regex.PatternSyntaxException: Unknown inline modifier near index 11 \(([^()]|(?R))*\)
Virtuoso: The Virtuoso based triple store (tested on: https://sparql.uniprot.org/sparql) does work, but also returns incorrect outputs, as exemplified with the query below:
SELECT ?lf
WHERE
{
BIND("((test)" AS ?l)
FILTER REGEX(?l, "\\(([^()]|(?R))*\\)").
}
I'm not sure whether this is intentional, a bug, or that I'm doing something wrong. Ultimately I want to get it to work on the Jena based triplestore. Can anyone help me with this?
Upvotes: 2
Views: 206
Reputation: 786
Just to clarify and augment my comment about the use of REPLACE
, the following should work:
SELECT *
{
VALUES ?value {
"( (1) ((2)) (((3))) 4)"
"( (1) ((2)) (((3)) 4)"
"before (test) after"
"before ((test) after"
}
bind(!regex(
replace(?value, '(?=\\()(?:(?=.*?\\((?!.*?\\1)(.*\\)(?!.*\\2).*))(?=.*?\\)(?!.*?\\2)(.*)).)+?.*?(?=\\1)[^(]*(?=\\2$)', '')
, '[()]') as ?result)
}
Upvotes: 1