PJ.
PJ.

Reputation: 1206

Stream error during end-stage of decommission

I got the following exception during the end-stage of my node's decommission:

Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
    at org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2946)
    at org.apache.cassandra.service.StorageService.decommission(StorageService.java:2903)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
    at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
    at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
    at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
    at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
    at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
    at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
    at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
    at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
    at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
    at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
    at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
    at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
    at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
    at sun.rmi.transport.Transport$1.run(Transport.java:177)
    at sun.rmi.transport.Transport$1.run(Transport.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
    at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
    at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2941)
    ... 36 more
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
    at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
    at com.google.common.util.concurrent.Futures$4.run(Futures.java:1160)
    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
    at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
    at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
    at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:216)
    at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191)
    at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:331)
    at org.apache.cassandra.streaming.StreamSession.convict(StreamSession.java:600)
    at org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:237)
    at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:643)
    at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:64)
    at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:170)
    at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:75)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    ... 3 more

This exception trace came from the nodetool decommission command itself. No exception is logged in the decommissioned node's system.log

In the receiving node, the exception is as follows:

 INFO [NonPeriodicTasks:1] 2014-06-02 04:40:53,101 SecondaryIndexManager.java (line 146) Index build of [myks.mycf] complete
ERROR [NonPeriodicTasks:1] 2014-06-02 04:40:53,240 CassandraDaemon.java (line 198) Exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.RuntimeException: Outgoing stream handler has been closed
    at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:170)
    at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:620)
    at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:566)
    at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:120)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

A brief background

I have 4 nodes in my Solr DC including the node that I'm decommissioning. Two of the three remaining nodes already completed the streams and index build a few days ago; and the last one (the slowest) completed its index build just a few hours ago. As shown in the node's log above, the exception happened right after the completion of the index build.

The question

Can I assume that the node has successfully decommissioned despite the exception? As far as I can tell, all files that the last node was supposed to receive were transferred successfully (nodetool netstats shows 100% for all files before the exception happened). I am thinking that the stream error only pertains to the failure to close the stream session - because I noticed that even though the streams were completed days ago, the session is kept open until the index build is done (the proof is that netstats is still showing some output during the several days that the index build is running). I need somebody to confirm if this is correct and whether I can safely delete the data files in the decommissioned node.

Some additional info

  1. DSE 4.0.3 (Cassandra 2.0.7)
  2. Vnodes enabled
  3. CentOS 6 x86_64
  4. nodetool status and nodetool gossipinfo still show the decommissioned node as "LEAVING"

Upvotes: 3

Views: 1720

Answers (1)

Nom de plume
Nom de plume

Reputation: 461

I think you hit one of the many streaming bugs that were present back in the 2.0 days. Maybe even this one:

https://issues.apache.org/jira/browse/CASSANDRA-8343

If that not that one there were a few others. In any case my main advice would be to get this cluster to the latest in 4.8. If for operational reasons in the short term that's not practical I would at least push for the latest 4.0. There really are a lot of fixes between that and current:

https://docs.datastax.com/en/datastax_enterprise/4.0/datastax_enterprise/RNdse40.html

The 4.0.7 fix list alone is eye-watering!

Upvotes: 0

Related Questions