FranMercaes
FranMercaes

Reputation: 151

Standalone blazegraph wikidata server no results returned

I just successfully create a local standalone Blazegraph instance and uploaded Wikidata database following the instruction here https://github.com/wikimedia/wikidata-query-rdf/blob/master/docs/getting-started.md.

This is the "super" command I used:

git clone --recurse-submodules https://gerrit.wikimedia.org/r/wikidata/query/rdf wikidata-query-rdf && cd wikidata-query-rdf && mvn package && cd dist/target && unzip service-*-dist.zip && cd service-*/

nohup ./runBlazegraph.sh &

mkdir data && wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.gz && mkdir data/split && ./munge.sh -f latest-lexemes.ttl.gz -d data/split -l en,es -s && ./loadRestAPI.sh -n wdq -d `pwd`/data/split && ./runUpdate.sh -n wdq -l en,es -s

./runUpdate.sh is still running but has already pulled up updates up to 2019-09-23T13:31:56Z

Testing it, I compared my local Wikidata results with Wikidata Query Service results and there are differences.

For instance, if I run the "Cats" query from examples:

#Cats
SELECT ?item ?itemLabel 
WHERE 
{
  ?item wdt:P31 wd:Q146.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Wikidata Query Service has 142 results. I have NONE.

If I run the "Recent Events" query from examples:

#Recent Events
SELECT ?event ?eventLabel ?date
WHERE
{
    # find events
    ?event wdt:P31/wdt:P279* wd:Q1190554.
    # with a point in time or start date
    OPTIONAL { ?event wdt:P585 ?date. }
    OPTIONAL { ?event wdt:P580 ?date. }
    # but at least one of those
    FILTER(BOUND(?date) && DATATYPE(?date) = xsd:dateTime).
    # not in the future, and not more than 31 days ago
    BIND(NOW() - ?date AS ?distance).
    FILTER(0 <= ?distance && ?distance < 31).
    # and get a label as well
    OPTIONAL {
        ?event rdfs:label ?eventLabel.
        FILTER(LANG(?eventLabel) = "en").
    }
}
# limit to 10 results so we don't timeout
LIMIT 10

Wikidata Query Service returns obviously 10 results. I have ONE.

Why this differences in the results? Is there anything I did wrong?

Thank you in advance.

Additional info about the machine where I'm running Wikidata, just in case it's important.

Upvotes: 1

Views: 471

Answers (1)

Wolfgang Fahl
Wolfgang Fahl

Reputation: 15769

In January 2018 I did a successful Wikidata Import following the instructions you'll find at http://wiki.bitplan.com/index.php/WikiData#Import. My first try with a standard hard disk took so long I estimated a 10 day import time. When I switch to SSD the import time went down to 2.9 days. At the time I needed a 512 GByte SSD to fit the jnl file.

Since 2018-01 Wikidata has grown more so you can expect at least a proportional increase in the import time. There has been some discussion on the importing recently in the Wikidata mailing list so there you'll find hints on alternatives and speed issues.

Before the import is finished you'll not get sensible results because linking triples might not be there yet.

For the cats example my 2018-01 import has 111 results after 2secs. The events example depends on when you run the query and when you did the import and how many events per month are in the period. I changed the 31 days to 600 to get 10 results after some 30 secs. If I run the query with no limits and 31 it will not give a result after 7 hours ...

Upvotes: 1

Related Questions