Reputation: 858
I'm trying to create a list of companies within a particular industry type (PaaS/SaaS) using dbpedia and sparql. I read this post on creating a list of companies with a certain number of employees, and I wanted to FILTER for a particular industry within a sparql query such as this one:
https://gist.github.com/szydan/e801fa687587d9eb0f6a
I tried this query (ommitting prefixes here):
CONSTRUCT{
?iri a dbpedia-owl:Company;
foaf:name ?companyName;
dbpedia-owl:abstract ?description;
owl:sameAs ?sameAs;
dbpedia:countryCode ?countryCode;
sindicetech:locationName ?locationName;
sindicetech:locationCityName ?locationCityName
}WHERE{
?iri a dbpedia-owl:Company.
OPTIONAL{
?iri dbpedia-owl:abstract ?description.
FILTER( lang(?description) = "en")
FILTER (regex(?description, '^platform$')) .
}
{
OPTIONAL{
?iri foaf:name ?companyName.
FILTER( lang(?companyName) = "en")
}
}UNION{
OPTIONAL{
?iri rdfs:label ?companyName .
FILTER( lang(?companyName) = "en")
}
}
OPTIONAL{
?iri owl:sameAs ?sameAs
}
{
OPTIONAL{
?iri dbpedia:locationCountry ?country.
?country dbpedia:countryCode ?countryCode
FILTER( lang(?countryCode) = "en")
}
}UNION{
OPTIONAL{
?iri dbpedia-owl:locationCountry ?country.
?country dbpedia:countryCode ?countryCode
FILTER( lang(?countryCode) = "en")
}
}
OPTIONAL{
?iri dbpedia-owl:location ?location.
?location dbpedia:name ?locationName
FILTER( lang(?locationName) = "en")
}
OPTIONAL{
?iri dbpedia-owl:locationCity ?locationCity.
?locationCity rdfs:label ?locationCityName
FILTER( lang(?locationCityName) = "en")
}
}
LIMIT 100
to see if I could find platform as a service companies...but i'm getting all kinds of results that don't have that word in the description. Perhaps the FILTER (regex(?description, '^platform$'))
regex is wrong? Is there a way I could filter for:
?industrySector dbpedia-owl:industry <http://dbpedia.org/resource/Platform_as_a_service>
Or perhaps I should be trying to narrow it down by filtering ontologically?
http://mappings.dbpedia.org/index.php/OntologyProperty:Industry
I'm using DBPEDIA's Virtuoso to test these queries, and ideally, I'd like to arrive at a RDF hierarchy of categories with a CONSTRUCT query, that gives me all companies within a particular industry, such as PaaS, SaaS, etc. But I'm not married to CONSTRUCT queries, and I'll take any advice!
Upvotes: 1
Views: 759
Reputation: 85823
First, two notes.
{
OPTIONAL{
?iri foaf:name ?companyName.
FILTER( lang(?companyName) = "en")
}
}UNION{
OPTIONAL{
?iri rdfs:label ?companyName .
FILTER( lang(?companyName) = "en")
}
}
either
optional {
?iri rdfs:label|foaf:name ?companyName .
filter langMatches(lang(?companyName),"en")
}
or
values ?nameProperty { rdfs:label foaf:name }
optional {
?iri ?nameProperty ?companyName .
filter langMatches(lang(?companyName),"en")
}
Property paths can make some other parts of your query shorter, too. E.g.,
?iri dbpedia-owl:locationCity ?locationCity.
?locationCity rdfs:label ?locationCityName
can be just:
?iri dbpedia-owl:locationCity/rdfs:label ?locationCityName
since you didn't use ?locationCity anywhere.
Finally, as to
i'm getting all kinds of results that don't have that word in the description. Perhaps the FILTER (regex(?description, '^platform$')) regex is wrong?
The regular expression doesn't quite do what you want it to:
FILTER (regex(?description, '^platform$'))
That will only match when the characters in the string are exactly "platform". It seems more like you'd want to check whether the description contains the word platform, in which case you can use contains, as in contains(?description,"platform"). But even if you update like that, you'll have
optional {
?iri dbpedia-owl:abstract ?description.
filter contains(?description,"platform")
filter langMatches(lang(?description),"en")
}
and that's still inside an optional block. The whole point of optional is that you can get results even if the optional part doesn't match. If you want to require that there is a description that contains the word platform, don't make it optional.
After all that, your query becomes:
prefix sindicetech: <urn:ex:sindicetech:>
construct {
?iri a dbpedia-owl:Company ;
foaf:name ?companyName ;
dbpedia-owl:abstract ?description ;
owl:sameAs ?sameAs ;
dbpedia:countryCode ?countryCode ;
sindicetech:locationName ?locationName ;
sindicetech:locationCityName ?locationCityName
}
where {
?iri a dbpedia-owl:Company ;
dbpedia-owl:abstract ?description .
filter langMatches(lang(?description),"en") .
filter contains(?description,"platform") .
optional {
?iri foaf:name|rdfs:label ?companyName.
filter langMatches(lang(?companyName),"en")
}
optional {
?iri owl:sameAs ?sameAs
}
optional {
?iri (dbpedia:locationCountry|dbpedia-owl:locationCountry)/dbpedia:countryCode ?countryCode .
filter langMatches(lang(?countryCode),"en")
}
optional {
?iri dbpedia-owl:location/dbpedia:name ?locationName
filter langMatches(lang(?locationName),"en")
}
optional {
?iri dbpedia-owl:locationCity/rdfs:label ?locationCityName
filter langMatches(lang(?locationCityName),"en")
}
}
limit 100
You can see that the results are about companies with "platform" in their descriptions.
Note that none of them have any dbpedia:countryCode properties though. I don't know where you found that property, but it doesn't appear to be used anywhere in DBpedia. The query select (count(*) as ?n) { ?x dbpedia:countryCode ?y } returns 0.
Is there a way I could filter for:
?industrySector dbpedia-owl:industry <http://dbpedia.org/resource/Platform_as_a_service>
If you look at http://dbpedia.org/resource/Platform_as_a_service you'll that it's related to a number of companies (but not all that many) by a few different properties:
You might just ask for anything that's a company that's related to this by any property. E.g.,
select distinct ?company where {
?company a dbpedia-owl:Company ;
?property dbpedia:Platform_as_a_service .
}
You can use that approach to get construct more detailed information, too. I'd end up with something like this:
prefix sindicetech: <urn:ex:sindicetech:>
construct {
?company a dbpedia-owl:Company ;
foaf:name ?label ;
dbpedia-owl:abstract ?abstract ;
owl:sameAs ?_company ;
sindicetech:location [ sindicetech:city ?city ;
sindicetech:country ?country ] .
}
where {
?company a dbpedia-owl:Company ;
?property dbpedia:Platform_as_a_service ;
rdfs:label ?label ;
dbpedia-owl:abstract ?abstract .
filter langMatches(lang(?label),"en")
filter langMatches(lang(?abstract),"en")
optional {
?company owl:sameAs ?_company
}
optional {
?company dbpedia-owl:location [ rdfs:label ?city ;
dbpedia-owl:country/rdfs:label ?country ] .
filter langMatches(lang(?city),"en")
filter langMatches(lang(?country),"en")
}
}
Upvotes: 5