Bharadwaj
Bharadwaj

Reputation: 93

MLCP load throws DeadLock - MarkLogic 8

I am using MarkLogic 8 on 2 RHEL6 servers which are clustered. I am facing DEADLOCK (Notice) errors while loading data using mlcp. Details:

Data: 500+ CSV files

File name Examples:
File1: 20170927_**ABC**_XX_YY.CSV
File2: 20170927_**DEF**_QX_QY.CSV
File3: 20170927_**DE**_QX_QY.CSV

Requirement: I need to load these documents while assigning each CSV to a collection during the load. So, File1 should belong to ABC Collection, File2 should belong to DEF collection and File3 should belong to DE collection.

Script: I have tried to achieve this by loading each CSV individually using mlcp.

#!/bin/sh
listFiles=`ls -l /location/*.CSV | awk '{print $9}'`
for each in $listFiles
do
     collName=`echo $each | cut -d_ -f2`
     $MLCP_HOME/mlcp.sh import -mode local -options_file connect.txt \
     -input_file_path $each -input_file_type delimited_text \
     -generate_uri -output_collections $collName
done

Issue: Some of the files have got loaded into MarkLogic without any error. However, I see 'Notice' level DEADLOCK messages in the logs and the loading is stalled.

Question: I understand DEADLOCK occurs when 2 or more queries(updates) try to achieve lock on a URI which is already holding a write-lock.

  1. I was hoping that any number of threads of mlcp load will write data into one URI at a time. How is a DEADLOCK possible?
  2. Why is it called a DEADLOCK when one query is waiting for the other query to complete. Is it not just queuing up?

I see the the following code was given as an example of deadlock in marklogic docs. I do not understand why is it a deadlock. One command is waiting for the other to complete.

(: the next line ensures this runs as an update statement :)
if ( 1 = 2) then ( xdmp:document-insert("foobar", <a/>) ) else (),
doc("/docs/test.xml"),
xdmp:eval("xdmp:node-replace(doc('/docs/test.xml')/a, <b>goodbye</b>)",
          (),
          <options xmlns="xdmp:eval">
            <isolation>different-transaction</isolation>
          </options>) ,
doc("/docs/test.xml")

Upvotes: 1

Views: 256

Answers (1)

grtjn
grtjn

Reputation: 20414

I can't really see why you are getting deadlocks. I'd still suspect something outside MLCP is generating those message. Could there be some schedule or just some entirely separate process causing the messages?

I can try to explain deadlocks in respect to MLCP a bit more.

Deadlocks normally occur when you touch a file in update mode, and then spawn, invoke, or eval code that touches the same file. The request that spawns, invokes, or evals hasn't finished yet, so the automatic read or write lock isn't released yet. The subprocess does see the lock and is forced to wait till it gets released. The parent process will be waiting on the subprocess to complete though, so hence the deadlock.

It becomes a little more complex with MLCP, since MLCP will open long-lasting transactions, and will emit multiple calls participating in the same transaction. Those automatic locks will not get released until the entire long-lasting transaction is released. So if MLCP tries to insert the same file twice in the same transaction, that will be trouble.

There might be a way to check if it really is MLCP that is causing the trouble. There are a few command-line arguments that control how many threads are used, and how many requests are included in one transaction. Try using:

-transaction_size 1 -batch_size 1

Additionally, if you really want to process your files in a sequential manner, add this additionally:

-thread_count 1

You can run MLCP with just the import command (and no other arguments) to get a summary of all the command-line options.

HTH!

Upvotes: 2

Related Questions