Cameron Kerr
Cameron Kerr

Reputation: 1875

Post upgrade to Dspace 5.1, 'Generate citation from metadata' failing with no errors logged

We have recently upgraded to DSpace 5.1 after having inherited a system from a much more capable Dspace administrator.

One of the issues we have recently noticed is that some of the citations appear to not be generated. There is nothing I could find in either of the dspace / tomcat / solr logs that would point to the fault (that I could find).

Running the curation task manually (as cron does every two minutes) runs for an expected duration and doesn't abort with a failure.

sudo -u tomcat /usr/local/dspace/bin/dspace curate -q continually -r - -v

From the administrative interface, the following fails if I specify don't specify a handle.

If I do specify a handle of an object that doesn't have a citation, it works.

Here is what it says on failure:

Task: Generate citation from metadata The task was completed successfully. STATUS: Fail, RESULT: The curation task did not provide more information about the outcome.

and on success:

Task: Generate citation from metadata The task was completed successfully. STATUS: Success, RESULT: Added citation ...

When I generate a citation successfully, I get the following logs (filtered) in the dspace log:

2015-04-28 16:47:26,122 INFO  org.dspace.content.Item @ ...:session_id=...:ip_addr=...:update_item:item_id=5966
2015-04-28 16:47:26,367 INFO  org.dspace.curate.Curator @ Curation task: citation performed on: .../5634 with status: 0. Result: 'Added citation ...

If I run the task with no handle, then I get the following:

2015-04-28 16:52:19,972 INFO  org.dspace.curate.Curator @ Curation task: citation performed on:  with status: 1

Hmmm, I wonder if now I have to specify a handle PREFIX/0 on the end

Task: Generate citation from metadata The task was completed successfully.

STATUS: Skip, RESULT: Item already has citation, skipping; item_id=2479

Should I be running curation tasks (from cron) differently now?

Perhaps curation tasks are not getting queued correctly when a new object is submitted; where can I inspect the curation task queue (even in Database)?

Cheers, Cameron

Update

/usr/local/dspace/ctqueues/continually# ls -l
total 448
-rw-r--r-- 1 tomcat tomcat      0 Apr 17 09:40 lock0
-rw-r--r-- 1 tomcat tomcat 420832 Apr 17 09:37 queue0
-rw-r--r-- 1 tomcat tomcat  15691 Apr 28 10:44 queue1
-rw-r--r-- 1 tomcat tomcat  12986 Apr 14 13:39 queue6

I wanted to try and preserve the contents of the queue so it would drain, so I just removed the lock0 file and ran the sudo -u tomcat /usr/local/dspace/bin/dspace curate -q continually -r - -v to try and drain the work queues, but all I seem to get is two new lock files, lock0 and lock1.

I was expecting that the dspace command would finish and the queue files should be removed, but I think my understanding must be false. So I then deleted the files and went to restart tomcat. But I see the server is now busy and seems to be doing curation task activities, so I'll consider that a promising sign. Thanks.

Upvotes: 0

Views: 252

Answers (1)

schweerelos
schweerelos

Reputation: 2189

First of all, the citation generator is not a standard DSpace task.

Secondly, yes you need to run systemwide curation tasks in the admin UI using [your-prefix]/0; DSpace should throw an error message if that isn't given, not sure whether it does.

None of the changes in DSpace 5.1 would have an impact on how you'd run curation tasks via cron.

The curation task queue is file based and lives in [dspace]/ctqueues/[name-of-queue]/[queue-file]. From your command above, it looks like the task queue you're interested in is the "continually" queue. It may be that some hiccup in processing curation tasks left behind old queueN files and this may disrupt processing of the queue / adding new tasks to that queue. You may wish to delete those old queueN files.

Update

Your understanding of how the queue files should behave is correct; you may be seeing two lock files if there are two commands working off the curation queue (eg the second gets started while the first one is still running, which may happen if you work off the same queue often and for some reason there was a pileup of queue files). I believe I've seen lock files left behind when the curate command crashed, so you may wish to add the queue directories to your list of things to check after a crash. Also, always make sure the ctqueues directory tree has the correct permissions (not an issue in your case it looks like).

Upvotes: 0

Related Questions