Mark Small
Mark Small

Reputation: 414

Apache jena 3.4.0 database error during data phase

I am trying to generate a new database using tdbloader2 using the command:

/opt/apache-jena-3.4.0/bin/tdbloader2 \
  --loc /fuseki/databases/blue/DS-DB \
  /var/lib/data/work/time-stamp.nt \
  /var/lib/data/work/Output/*.ttl \
  /var/lib/data/work/ReferenceData/*.ttl \
  /var/lib/data/work/vocabs/*.ttl

The error I get is:

...
INFO  Load: /var/lib/data/work/ReferenceData/SAMPLING_POINT_TYPES.ttl -- 2024/03/04 00:01:17 UTC                                                                                                           
INFO  Load: /var/lib/data/work/ReferenceData/UNITS.ttl -- 2024/03/04 00:01:17 UTC                                                                                                                          
INFO  Load: /var/lib/data/work/vocabs/determinand.ttl -- 2024/03/04 00:01:17 UTC                                                                                                                      
INFO  Load: /var/lib/data/work/vocabs/ea-org.ttl -- 2024/03/04 00:01:17 UTC                                                                                                                           
INFO  Load: /var/lib/data/work/vocabs/intentional-determinand-groups.ttl -- 2024/03/04 00:01:17 UTC                                                                                                   
INFO  Load: /var/lib/data/work/vocabs/result-qualifiers.ttl -- 2024/03/04 00:01:17 UTC                                                                                                                
INFO  Load: /var/lib/data/work/vocabs/sample.ttl -- 2024/03/04 00:01:17 UTC                                                                                                                           
INFO  Load: /var/lib/data/work/vocabs/sampling-point-status.ttl -- 2024/03/04 00:01:17 UTC                                                                                                            
INFO  Load: /var/lib/data/work/vocabs/sampling-point.ttl -- 2024/03/04 00:01:17 UTC                                                                                                                   
INFO  Load: /var/lib/data/work/vocabs/unit.ttl -- 2024/03/04 00:01:17 UTC                                                                                                                             
INFO  Total: 389,245,433 tuples : 4,012.31 seconds : 97,012.73 tuples/sec [2024/03/04 00:01:21 UTC]                                                                                                                
 00:01:22 INFO Data Load Phase Completed                                                                                                                                                                           
 00:01:22 INFO Index Building Phase                                                                                                                                                                                
 00:01:22 INFO Creating Index SPO                                                                                                                                                                                  
 00:01:22 INFO Sort SPO                                                                                                                                                                                            
/opt/apache-jena-3.4.0/bin/tdbloader2index: line 306: 63100 Killed                  sort $SORT_ARGS -u $KEYS < "$DATA" > $WORK                                                                                     
 00:02:28 ERROR Failed during data phase                                                                                                                                                                           

I expect the database to be correctly generated and the command to complete successfully.

There is a lot of data in these *.ttl files to work with, total of 2050 and most around 10M in size each (I have tried this with a much smaller sample and it works just fine). I have tried this on a machine with 32G of memory, so I don't think that is an issue, certainly not from what I can tell from the error message at least.

I also tried the same thing on a machine with a GPU processor and 64G of memory and the same thing, so I'm pretty sure it's nothing to do with the GPU/CPU or Memeory. However, I am completely stuck on what it could be, so any help would be much appreciated, thanks in advance.

Upvotes: 0

Views: 38

Answers (1)

Mark Small
Mark Small

Reputation: 414

So as I stated in my last comment I think it is a case that my dev VM didn't have enough disk-space/memory to do this task. I've been given access to another VM with more resources and so far I haven't experienced this again. Continual battle between devs and infrastructure...

Upvotes: 1

Related Questions