Enrico
Enrico

Reputation: 11

Datastax Solr nodes: Nodetool repair stuck

We have two data centers (DC1 in europe, DC2 in north america) of DatastaxEnterprise Solr cluster (version 4.5) on CentOs:

DC1: 2 nodes with rf set to 2
DC2: 1 nodes with rf set to 1

Every node has 2 cores and 4gb of RAM. We created only one keyspace, the 2 nodes of DC1 have 400MB each of data while the node in DC2 is empty.

if I start a nodetool repair on the node in DC2, the command works well for about 20/30 minutes and then it stops working remaining stuck.

In the logs of the node in DC2 i can read this:

WARN [NonPeriodicTasks:1] 2014-10-01 05:57:44,188 WorkPool.java (line 398) Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
 millis, consider increasing it, or reducing load on the node.
ERROR [NonPeriodicTasks:1] 2014-10-01 05:57:44,190 CassandraDaemon.java (line 199) Exception in thread Thread[NonPeriodicTasks:1,5,main]
org.apache.solr.common.SolrException: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
 millis, consider increasing it, or reducing load on the node.
    at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.commit(CassandraDirectUpdateHandler.java:351)
    at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.doCommit(AbstractSolrSecondaryIndex.java:994)
    at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.forceBlockingFlush(AbstractSolrSecondaryIndex.java:139)
    at org.apache.cassandra.db.index.SecondaryIndexManager.flushIndexesBlocking(SecondaryIndexManager.java:338)
    at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:144)
    at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:113)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
 millis, consider increasing it, or reducing load on the node.
    at com.datastax.bdp.concurrent.WorkPool.doFlush(WorkPool.java:399)
    at com.datastax.bdp.concurrent.WorkPool.flush(WorkPool.java:339)
    at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.flushIndexUpdates(AbstractSolrSecondaryIndex.java:484)
    at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.commit(CassandraDirectUpdateHandler.java:278)
    ... 12 more
 WARN [commitScheduler-3-thread-1] 2014-10-01 05:58:47,351 WorkPool.java (line 398) Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
 millis, consider increasing it, or reducing load on the node.
ERROR [commitScheduler-3-thread-1] 2014-10-01 05:58:47,352 SolrException.java (line 136) auto commit error...:org.apache.solr.common.SolrException: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
 millis, consider increasing it, or reducing load on the node.
    at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.commit(CassandraDirectUpdateHandler.java:351)
    at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
 millis, consider increasing it, or reducing load on the node.
    at com.datastax.bdp.concurrent.WorkPool.doFlush(WorkPool.java:399)
    at com.datastax.bdp.concurrent.WorkPool.flush(WorkPool.java:339)
    at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.flushIndexUpdates(AbstractSolrSecondaryIndex.java:484)
    at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.commit(CassandraDirectUpdateHandler.java:278)
    ... 8 more

I tried increasing some timeouts in cassandra.yaml file, without luck. Thanks

Upvotes: 1

Views: 1197

Answers (3)

phact
phact

Reputation: 7305

One way to reduce contention from solr indexing is to increase autoSoftCommit maxTime in your solrconfig.xml

<autoSoftCommit>
   <maxTime>1000000</maxTime>
</autoSoftCommit>

Upvotes: 0

Caleb Rackliffe
Caleb Rackliffe

Reputation: 575

Two items that might be helpful:

  1. The RuntimeException you are seeing in the log is along the Lucene code path that commits index changes to disk, so I would certainly determine whether or not writing to disk is your bottleneck. (Are you using different physical disks for your data and commit log?)

  2. The parameter that you probably want to tweak in the mean time is the one that controls WorkPool flush timeouts in dse.yaml is called flush_max_time_per_core.

Upvotes: 1

Nom de plume
Nom de plume

Reputation: 461

Your nodes are fairly under specified for a DSE solr installation.

I would normally recommend at least 8 cores and at least 64 Gb of memory. Allocate the heap up to 12-14 Gb.

The following troubleshooting guide is pretty good:

https://support.datastax.com/entries/38367716-Solr-Configuration-Best-Practices-and-Troubleshooting-Tips

Your current data load is small so you probably don't need the full whack for memory - I'd guess the bottleneck here is the cpus.

If you're not running 4.0.4 or 4.5.2 I'd get up to one of those versions.

Upvotes: 1

Related Questions