Reputation: 21840
I thought it would be interesting to ask DBPedia which of its resources are the most predicate-rich.
I tried running the following query:
SELECT DISTINCT ?s (count(?p) AS ?info)
WHERE {
?s ?p ?o .
}
GROUP BY ?s ?p
ORDER BY desc(?info)
LIMIT 50
and it timed out, so I can't verify whether or not it was the right query.
So, I'm left with the following two questions:
Upvotes: 2
Views: 78
Reputation: 85883
Suppose you've got data like this:
@prefix : <http://stackoverflow.com/q/22391927/1281433/> .
:a :p 1, 2, 3 ;
:q 4, 5 .
:b :p 1, 2 ;
:q 3, 4 ;
:r 5, 6 .
:c :p 1 ;
:q 2 ;
:r 3 .
Then you can ask how many triples each resource is the subject of with a query like this:
prefix : <http://stackoverflow.com/q/22391927/1281433/>
select ?s (count(*) as ?n) where {
?s ?p ?o
}
group by ?s
order by desc(?n)
----------
| s | n |
==========
| :b | 6 |
| :a | 5 |
| :c | 3 |
----------
Notice that you only want to group by ?s
if you're interested in how many triples each resource is the subject of. In you original query, where you group by ?s ?p
, you're going to sorting (subject,predicate) pairs by how many values they have. E.g.,
prefix : <http://stackoverflow.com/q/22391927/1281433/>
select ?s ?p (count(*) as ?n) where {
?s ?p ?o
}
group by ?s ?p
order by desc(?n)
---------------
| s | p | n |
===============
| :a | :p | 3 |
| :b | :p | 2 |
| :a | :q | 2 |
| :b | :q | 2 |
| :b | :r | 2 |
| :c | :p | 1 |
| :c | :q | 1 |
| :c | :r | 1 |
---------------
I don't expect that you'll be able to run a query like this on DBpedia. It requires touching every triple in the data, and then ordering the resources by how many triples they're the subject of. That sounds like a lot of work. You might be able to download the data, load it into a local endpoint and run the query, and so avoid the timeout, but I wouldn't be surprised if it still takes a while.
Upvotes: 3