Reputation: 1206
I got the following exception during the end-stage of my node's decommission:
Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2946)
at org.apache.cassandra.service.StorageService.decommission(StorageService.java:2903)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at sun.rmi.transport.Transport$1.run(Transport.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2941)
... 36 more
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1160)
at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:216)
at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:191)
at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:331)
at org.apache.cassandra.streaming.StreamSession.convict(StreamSession.java:600)
at org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:237)
at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:643)
at org.apache.cassandra.gms.Gossiper.access$700(Gossiper.java:64)
at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:170)
at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:75)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
... 3 more
This exception trace came from the nodetool decommission command itself. No exception is logged in the decommissioned node's system.log
In the receiving node, the exception is as follows:
INFO [NonPeriodicTasks:1] 2014-06-02 04:40:53,101 SecondaryIndexManager.java (line 146) Index build of [myks.mycf] complete
ERROR [NonPeriodicTasks:1] 2014-06-02 04:40:53,240 CassandraDaemon.java (line 198) Exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.RuntimeException: Outgoing stream handler has been closed
at org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:170)
at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:620)
at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:566)
at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:120)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
A brief background
I have 4 nodes in my Solr DC including the node that I'm decommissioning. Two of the three remaining nodes already completed the streams and index build a few days ago; and the last one (the slowest) completed its index build just a few hours ago. As shown in the node's log above, the exception happened right after the completion of the index build.
The question
Can I assume that the node has successfully decommissioned despite the exception? As far as I can tell, all files that the last node was supposed to receive were transferred successfully (nodetool netstats shows 100% for all files before the exception happened). I am thinking that the stream error only pertains to the failure to close the stream session - because I noticed that even though the streams were completed days ago, the session is kept open until the index build is done (the proof is that netstats is still showing some output during the several days that the index build is running). I need somebody to confirm if this is correct and whether I can safely delete the data files in the decommissioned node.
Some additional info
Upvotes: 3
Views: 1720
Reputation: 461
I think you hit one of the many streaming bugs that were present back in the 2.0 days. Maybe even this one:
https://issues.apache.org/jira/browse/CASSANDRA-8343
If that not that one there were a few others. In any case my main advice would be to get this cluster to the latest in 4.8. If for operational reasons in the short term that's not practical I would at least push for the latest 4.0. There really are a lot of fixes between that and current:
https://docs.datastax.com/en/datastax_enterprise/4.0/datastax_enterprise/RNdse40.html
The 4.0.7 fix list alone is eye-watering!
Upvotes: 0