Reputation: 151
I just successfully create a local standalone Blazegraph instance and uploaded Wikidata database following the instruction here https://github.com/wikimedia/wikidata-query-rdf/blob/master/docs/getting-started.md.
This is the "super" command I used:
git clone --recurse-submodules https://gerrit.wikimedia.org/r/wikidata/query/rdf wikidata-query-rdf && cd wikidata-query-rdf && mvn package && cd dist/target && unzip service-*-dist.zip && cd service-*/
nohup ./runBlazegraph.sh &
mkdir data && wget https://dumps.wikimedia.org/wikidatawiki/entities/latest-lexemes.ttl.gz && mkdir data/split && ./munge.sh -f latest-lexemes.ttl.gz -d data/split -l en,es -s && ./loadRestAPI.sh -n wdq -d `pwd`/data/split && ./runUpdate.sh -n wdq -l en,es -s
./runUpdate.sh is still running but has already pulled up updates up to 2019-09-23T13:31:56Z
Testing it, I compared my local Wikidata results with Wikidata Query Service results and there are differences.
For instance, if I run the "Cats" query from examples:
#Cats
SELECT ?item ?itemLabel
WHERE
{
?item wdt:P31 wd:Q146.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Wikidata Query Service has 142 results. I have NONE.
If I run the "Recent Events" query from examples:
#Recent Events
SELECT ?event ?eventLabel ?date
WHERE
{
# find events
?event wdt:P31/wdt:P279* wd:Q1190554.
# with a point in time or start date
OPTIONAL { ?event wdt:P585 ?date. }
OPTIONAL { ?event wdt:P580 ?date. }
# but at least one of those
FILTER(BOUND(?date) && DATATYPE(?date) = xsd:dateTime).
# not in the future, and not more than 31 days ago
BIND(NOW() - ?date AS ?distance).
FILTER(0 <= ?distance && ?distance < 31).
# and get a label as well
OPTIONAL {
?event rdfs:label ?eventLabel.
FILTER(LANG(?eventLabel) = "en").
}
}
# limit to 10 results so we don't timeout
LIMIT 10
Wikidata Query Service returns obviously 10 results. I have ONE.
Why this differences in the results? Is there anything I did wrong?
Thank you in advance.
Additional info about the machine where I'm running Wikidata, just in case it's important.
Upvotes: 1
Views: 471
Reputation: 15769
In January 2018 I did a successful Wikidata Import following the instructions you'll find at http://wiki.bitplan.com/index.php/WikiData#Import. My first try with a standard hard disk took so long I estimated a 10 day import time. When I switch to SSD the import time went down to 2.9 days. At the time I needed a 512 GByte SSD to fit the jnl file.
Since 2018-01 Wikidata has grown more so you can expect at least a proportional increase in the import time. There has been some discussion on the importing recently in the Wikidata mailing list so there you'll find hints on alternatives and speed issues.
Before the import is finished you'll not get sensible results because linking triples might not be there yet.
For the cats example my 2018-01 import has 111 results after 2secs. The events example depends on when you run the query and when you did the import and how many events per month are in the period. I changed the 31 days to 600 to get 10 results after some 30 secs. If I run the query with no limits and 31 it will not give a result after 7 hours ...
Upvotes: 1