Reputation: 10865
We are currently in a situation where a DSE node decided to decommission itself. It seems that at first it hit a Too many open files
error then decided that it was ok to remove the node from the ring because the disk is FULL
. Aside from the complete philosophical issues with have a node remove itself, the disk was only 1/4 utilized.
Here is the relevant entries from the log file:
ERROR [pool-1-thread-1] 2014-06-20 01:53:19,957 DiskHealthChecker.java (line 62) Error in finding disk space for directory /raid0/cassandra/data
java.io.IOException: Cannot run program "df": error=24, Too many open files
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
at java.lang.Runtime.exec(Runtime.java:617)
at java.lang.Runtime.exec(Runtime.java:485)
at org.apache.commons.io.FileSystemUtils.openProcess(FileSystemUtils.java:535)
at org.apache.commons.io.FileSystemUtils.performCommand(FileSystemUtils.java:482)
at org.apache.commons.io.FileSystemUtils.freeSpaceUnix(FileSystemUtils.java:396)
at org.apache.commons.io.FileSystemUtils.freeSpaceOS(FileSystemUtils.java:266)
at org.apache.commons.io.FileSystemUtils.freeSpaceKb(FileSystemUtils.java:200)
at org.apache.commons.io.FileSystemUtils.freeSpaceKb(FileSystemUtils.java:171)
at com.datastax.bdp.util.DiskHealthChecker.checkDiskSpace(DiskHealthChecker.java:52)
at com.datastax.bdp.util.DiskHealthChecker.checkDiskSpace(DiskHealthChecker.java:71)
at com.datastax.bdp.util.DiskHealthChecker.checkDiskSpace(DiskHealthChecker.java:71)
at com.datastax.bdp.util.DiskHealthChecker.checkDiskSpace(DiskHealthChecker.java:71)
at com.datastax.bdp.util.DiskHealthChecker.checkDiskSpace(DiskHealthChecker.java:71)
at com.datastax.bdp.util.DiskHealthChecker.checkDiskSpace(DiskHealthChecker.java:71)
at com.datastax.bdp.util.DiskHealthChecker.checkDiskSpace(DiskHealthChecker.java:71)
at com.datastax.bdp.util.DiskHealthChecker.access$000(DiskHealthChecker.java:18)
at com.datastax.bdp.util.DiskHealthChecker$DiskHealthCheckTask.run(DiskHealthChecker.java:104)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: error=24, Too many open files
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:135)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022)
... 24 more
INFO [pool-1-thread-1] 2014-06-20 01:53:19,959 DiskHealthChecker.java (line 82) Removing this node from the ring for the disk is close to FULL
INFO [pool-1-thread-1] 2014-06-20 01:53:19,996 StorageService.java (line 947) LEAVING: sleeping 30000 ms for pending range setup
ERROR [ReadStage:30] 2014-06-20 01:53:22,058 CassandraDaemon.java (line 191) Exception in thread Thread[ReadStage:30,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /raid0/cassandra/data/linkcurrent_search/content_items/linkcurrent_search-content_items-ic-1803-Data.db (Too many open files)
at org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:64)
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /raid0/cassandra/data/linkcurrent_search/content_items/linkcurrent_search-content_items-ic-1803-Data.db (Too many open files)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:58)
at org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1213)
at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:66)
at org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1017)
at org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:72)
at org.apache.cassandra.db.ColumnFamilyStore.getSequentialIterator(ColumnFamilyStore.java:1432)
at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1484)
at org.apache.cassandra.service.RangeSliceVerbHandler.executeLocally(RangeSliceVerbHandler.java:46)
at org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:58)
... 4 more
Caused by: java.io.FileNotFoundException: /raid0/cassandra/data/linkcurrent_search/content_items/linkcurrent_search-content_items-ic-1803-Data.db (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:67)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:75)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:54)
... 12 more
Upvotes: 0
Views: 168
Reputation: 822
If you haven't already, you may want to set
health_check_interval: 0
in your dse.yaml file to enable this option for now.
Upvotes: 1
Reputation: 310
Thanks for the finding, we will disable this function and leave it to other disk monitoring tool to alert administrator when the disk is close to full, so admin can take some action when it's close to full.
Upvotes: 1