S M Shamimul Hasan
S M Shamimul Hasan

Reputation: 6674

Virtuoso and Jena: Large RDF graphs loading issue

I have a 200GB RDF file in .nt format. I want to load it in Virtuoso (using Virtuoso Open-Source Edition 6.1.6). I used Virtuoso bulk loader from command line but loading gets hang after couple of hours of running. Do you have any idea how I can load this large file to Virtuoso efficiently? I want to load it fast.

I also tried to query my 200GB RDF graph from Apache Jena. However after running for 30 minutes it gives me some heap size space related error. If you have any solution for the above problem then kindly let me know.

Upvotes: 0

Views: 1002

Answers (2)

HughWilliams
HughWilliams

Reputation: 126

What is the actual dataset you are loading? Is it actually just one file? We would recommend splitting into files of about 1GB max, and loading multiple files at a time with the bulk loader.

Have you done any performance tuning of the Virtuoso Server for the resources available on the machine in use, as detailed in the RDF Performance Tuning guide?

Please check with the status(''); command how many buffers are in use as, if you run out during a load, you will be swapping to disk continuously, which will lead to the sort of apparent hangs you report.

Note you can also load the Virtuoso LD Meter functions to monitor the progress of the dataset loads.

Upvotes: 0

AndyS
AndyS

Reputation: 16700

Jena TDB has a bulk loader which has been used on large data input (hundred's of millions of triples).

Upvotes: 0

Related Questions