Reputation: 65
From time to time, after inserting new documents in ElasticSearch v1.1.0, got noshardavailableactionexception error. When checking with
curl 'localhost:9200/_cat/shards/cvk'
I got the answer: cvk 0 p UNASSIGNED
After restarting elastic with command:
/etc/init.d/elasticsearch restart
everything work fine.
ES running on VPS ubuntu v12, Index have only 1 shard and replication disabled.
I found "no space left on device" error in my log. But I have enough disc space on machine. I'm uploading in a batch of 1000 documents (each one is about 512 bytes).
How to fix flushing problem? And if it's not possible, how to reassign shard via REST interface (without restarting server)?
df from my vps
Filesystem Size Used Avail Use% Mounted on
/dev/vda 20G 13G 6.6G 65% /
udev 237M 12K 237M 1% /dev
tmpfs 50M 216K 49M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 246M 0 246M 0% /run/shm
log from vps shown errors
[2014-05-03 04:20:20,088][WARN ][index.translog ] [Molecule Man] [cvk][0] failed to flush shard on translog threshold
org.elasticsearch.index.engine.FlushFailedEngineException: [cvk][0] Flush failed
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:829)
at org.elasticsearch.index.shard.service.InternalIndexShard.flush(InternalIndexShard.java:589)
at org.elasticsearch.index.translog.TranslogService$TranslogBasedFlush$1.run(TranslogService.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes0(Native Method)
at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520)
at java.io.RandomAccessFile.write(RandomAccessFile.java:550)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:452)
at org.apache.lucene.store.BufferedChecksumIndexOutput.flushBuffer(BufferedChecksumIndexOutput.java:71)
at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:113)
at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:102)
at org.apache.lucene.store.BufferedChecksumIndexOutput.flush(BufferedChecksumIndexOutput.java:86)
at org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:92)
at org.elasticsearch.index.store.Store$StoreIndexOutput.writeBytes(Store.java:634)
at org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:52)
at org.apache.lucene.store.RAMOutputStream.writeTo(RAMOutputStream.java:65)
at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:970)
at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:579)
at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter$FindBlocks.freeze(BlockTreeTermsWriter.java:555)
at org.apache.lucene.util.fst.Builder.freezeTail(Builder.java:214)
at org.apache.lucene.util.fst.Builder.add(Builder.java:394)
at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1047)
at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:548)
at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81)
at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:465)
at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:506)
at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:616)
at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2864)
at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3022)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2989)
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:812)
... 5 more
[2014-05-03 04:20:20,321][WARN ][index.merge.scheduler ] [Molecule Man] [cvk][0] failed to merge
java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes0(Native Method)
at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520)
at java.io.RandomAccessFile.write(RandomAccessFile.java:550)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:452)
at org.apache.lucene.store.RateLimitedFSDirectory$RateLimitedIndexOutput.flushBuffer(RateLimitedFSDirectory.java:102)
at org.apache.lucene.store.BufferedChecksumIndexOutput.flushBuffer(BufferedChecksumIndexOutput.java:71)
at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:113)
at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:102)
at org.apache.lucene.store.BufferedChecksumIndexOutput.flush(BufferedChecksumIndexOutput.java:86)
at org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:92)
at org.elasticsearch.index.store.Store$StoreIndexOutput.writeBytes(Store.java:634)
at org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:52)
at org.apache.lucene.store.RAMOutputStream.writeTo(RAMOutputStream.java:65)
at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:980)
at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:767)
at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter$FindBlocks.freeze(BlockTreeTermsWriter.java:555)
at org.apache.lucene.util.fst.Builder.freezeTail(Builder.java:214)
at org.apache.lucene.util.fst.Builder.add(Builder.java:394)
at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1047)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$WrappedTermsConsumer.finishTerm(BloomFilterPostingsFormat.java:439)
at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:383)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4119)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3716)
at org.apache.lucene.index.TrackingSerialMergeScheduler.merge(TrackingSerialMergeScheduler.java:122)
at org.elasticsearch.index.merge.scheduler.SerialMergeSchedulerProvider$CustomSerialMergeScheduler.merge(SerialMergeSchedulerProvider.java:89)
at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:71)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1936)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1930)
at org.elasticsearch.index.merge.Merges.maybeMerge(Merges.java:47)
at org.elasticsearch.index.engine.internal.InternalEngine.maybeMerge(InternalEngine.java:926)
at org.elasticsearch.index.shard.service.InternalIndexShard$EngineMerger$1.run(InternalIndexShard.java:966)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2014-05-03 04:20:20,382][WARN ][index.engine.internal ] [Molecule Man] [cvk][0] failed engine
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: No space left on device
at org.elasticsearch.index.merge.scheduler.SerialMergeSchedulerProvider$CustomSerialMergeScheduler.merge(SerialMergeSchedulerProvider.java:92)
at org.elasticsearch.index.merge.EnableMergeScheduler.merge(EnableMergeScheduler.java:71)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1936)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1930)
at org.elasticsearch.index.merge.Merges.maybeMerge(Merges.java:47)
at org.elasticsearch.index.engine.internal.InternalEngine.maybeMerge(InternalEngine.java:926)
at org.elasticsearch.index.shard.service.InternalIndexShard$EngineMerger$1.run(InternalIndexShard.java:966)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes0(Native Method)
at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520)
at java.io.RandomAccessFile.write(RandomAccessFile.java:550)
at org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:452)
at org.apache.lucene.store.RateLimitedFSDirectory$RateLimitedIndexOutput.flushBuffer(RateLimitedFSDirectory.java:102)
at org.apache.lucene.store.BufferedChecksumIndexOutput.flushBuffer(BufferedChecksumIndexOutput.java:71)
at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:113)
at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:102)
at org.apache.lucene.store.BufferedChecksumIndexOutput.flush(BufferedChecksumIndexOutput.java:86)
at org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:92)
at org.elasticsearch.index.store.Store$StoreIndexOutput.writeBytes(Store.java:634)
at org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:52)
at org.apache.lucene.store.RAMOutputStream.writeTo(RAMOutputStream.java:65)
at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.writeBlock(BlockTreeTermsWriter.java:980)
at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.writeBlocks(BlockTreeTermsWriter.java:767)
at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter$FindBlocks.freeze(BlockTreeTermsWriter.java:555)
at org.apache.lucene.util.fst.Builder.freezeTail(Builder.java:214)
at org.apache.lucene.util.fst.Builder.add(Builder.java:394)
at org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1047)
at org.elasticsearch.index.codec.postingsformat.BloomFilterPostingsFormat$WrappedTermsConsumer.finishTerm(BloomFilterPostingsFormat.java:439)
at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:383)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:106)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4119)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3716)
at org.apache.lucene.index.TrackingSerialMergeScheduler.merge(TrackingSerialMergeScheduler.java:122)
at org.elasticsearch.index.merge.scheduler.SerialMergeSchedulerProvider$CustomSerialMergeScheduler.merge(SerialMergeSchedulerProvider.java:89)
... 9 more
[2014-05-03 04:20:20,490][DEBUG][action.bulk ] [Molecule Man] [cvk][0] failed to execute bulk item (index) index {[cvk][public][22017747], source[{"public":"22017747","name":"Private community | VK","desc":"\"\"","vol":0,"priv":null,"org":null,"phone":null,"email":null,"url":"5ghj6","wall":1,"post":null,"like":null,"share":null}]}
org.elasticsearch.index.engine.IndexFailedEngineException: [cvk][0] Index failed for [public#22017747]
at org.elasticsearch.index.engine.internal.InternalEngine.index(InternalEngine.java:483)
at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:396)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:401)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:645)
at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:659)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1525)
at org.elasticsearch.index.engine.internal.InternalEngine.innerIndex(InternalEngine.java:532)
at org.elasticsearch.index.engine.internal.InternalEngine.index(InternalEngine.java:470)
... 8 more
[2014-05-03 04:20:20,493][DEBUG][action.bulk ] [Molecule Man] [cvk][0], node[Sk1Eoi84TDW9anq_zQsNJg], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.bulk.BulkShardRequest@61204bff]
java.lang.NullPointerException
at org.elasticsearch.action.bulk.TransportShardBulkAction.applyVersion(TransportShardBulkAction.java:617)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:178)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:556)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:426)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2014-05-03 04:20:20,534][WARN ][cluster.action.shard ] [Molecule Man] [cvk][0] sending failed shard for [cvk][0], node[Sk1Eoi84TDW9anq_zQsNJg], [P], s[STARTED], indexUUID [m0nqEEqXQu-rHc5ipn4ZPA], reason [engine failure, message [MergeException[java.io.IOException: No space left on device]; nested: IOException[No space left on device]; ]]
[2014-05-03 04:20:20,534][WARN ][cluster.action.shard ] [Molecule Man] [cvk][0] received shard failed for [cvk][0], node[Sk1Eoi84TDW9anq_zQsNJg], [P], s[STARTED], indexUUID [m0nqEEqXQu-rHc5ipn4ZPA], reason [engine failure, message [MergeException[java.io.IOException: No space left on device]; nested: IOException[No space left on device]; ]]
Node info
indices: {
docs: {
count: 4439439
deleted: 0
}
store: {
size_in_bytes: 643890465
throttle_time_in_millis: 0
}
indexing: {
index_total: 2214686
index_time_in_millis: 1679906
index_current: 1
delete_total: 0
delete_time_in_millis: 0
delete_current: 0
}
get: {
total: 0
time_in_millis: 0
exists_total: 0
exists_time_in_millis: 0
missing_total: 0
missing_time_in_millis: 0
current: 0
}
search: {
open_contexts: 0
query_total: 0
query_time_in_millis: 0
query_current: 0
fetch_total: 0
fetch_time_in_millis: 0
fetch_current: 0
}
merges: {
current: 0
current_docs: 0
current_size_in_bytes: 0
total: 23
total_time_in_millis: 1081333
total_docs: 15716810
total_size_in_bytes: 5938832547
}
refresh: {
total: 8
total_time_in_millis: 0
}
flush: {
total: 202
total_time_in_millis: 677609
}
warmer: {
current: 0
total: 2
total_time_in_millis: 15
}
filter_cache: {
memory_size_in_bytes: 0
evictions: 0
}
id_cache: {
memory_size_in_bytes: 0
}
fielddata: {
memory_size_in_bytes: 0
evictions: 0
}
percolate: {
total: 0
time_in_millis: 0
current: 0
memory_size_in_bytes: -1
memory_size: -1b
queries: 0
}
completion: {
size_in_bytes: 0
}
segments: {
count: 18
memory_in_bytes: 38866707
}
translog: {
operations: 0
size_in_bytes: 0
}
}
os: {
timestamp: 1399114654034
uptime_in_millis: 701756
load_average: [
0
0.01
0.05
]
cpu: {
sys: 0
user: 0
idle: 99
usage: 0
stolen: 0
}
mem: {
free_in_bytes: 34357248
used_in_bytes: 480374784
free_percent: 33
used_percent: 66
actual_free_in_bytes: 172974080
actual_used_in_bytes: 341757952
}
swap: {
used_in_bytes: 0
free_in_bytes: 0
}
}
process: {
timestamp: 1399114654035
open_file_descriptors: 103
cpu: {
percent: 0
sys_in_millis: 118480
user_in_millis: 2057680
total_in_millis: 2176160
}
mem: {
resident_in_bytes: 263897088
share_in_bytes: 6635520
total_virtual_in_bytes: 1609924608
}
}
jvm: {
timestamp: 1399114654035
uptime_in_millis: 43582377
mem: {
heap_used_in_bytes: 80238424
heap_used_percent: 52
heap_committed_in_bytes: 152043520
heap_max_in_bytes: 152043520
non_heap_used_in_bytes: 42873536
non_heap_committed_in_bytes: 66764800
pools: {
young: {
used_in_bytes: 15877936
max_in_bytes: 41943040
peak_used_in_bytes: 41943040
peak_max_in_bytes: 41943040
}
survivor: {
used_in_bytes: 1463048
max_in_bytes: 5242880
peak_used_in_bytes: 5242880
peak_max_in_bytes: 5242880
}
old: {
used_in_bytes: 62897440
max_in_bytes: 104857600
peak_used_in_bytes: 104857600
peak_max_in_bytes: 104857600
}
}
}
threads: {
count: 36
peak_count: 40
}
gc: {
collectors: {
young: {
collection_count: 7359
collection_time_in_millis: 116960
}
old: {
collection_count: 2693
collection_time_in_millis: 131864
}
}
}
buffer_pools: {
direct: {
count: 16
used_in_bytes: 2694367
total_capacity_in_bytes: 2694367
}
mapped: {
count: 83
used_in_bytes: 635281868
total_capacity_in_bytes: 635281868
}
}
}
thread_pool: {
generic: {
threads: 2
queue: 0
active: 0
rejected: 0
largest: 6
completed: 9045
}
index: {
threads: 0
queue: 0
active: 0
rejected: 0
largest: 0
completed: 0
}
get: {
threads: 0
queue: 0
active: 0
rejected: 0
largest: 0
completed: 0
}
snapshot: {
threads: 1
queue: 0
active: 0
rejected: 0
largest: 1
completed: 442
}
merge: {
threads: 1
queue: 0
active: 0
rejected: 0
largest: 1
completed: 1297
}
suggest: {
threads: 0
queue: 0
active: 0
rejected: 0
largest: 0
completed: 0
}
bulk: {
threads: 1
queue: 0
active: 0
rejected: 0
largest: 1
completed: 2213
}
optimize: {
threads: 0
queue: 0
active: 0
rejected: 0
largest: 0
completed: 0
}
warmer: {
threads: 1
queue: 0
active: 0
rejected: 0
largest: 1
completed: 262
}
flush: {
threads: 1
queue: 0
active: 0
rejected: 0
largest: 1
completed: 205
}
search: {
threads: 0
queue: 0
active: 0
rejected: 0
largest: 0
completed: 0
}
percolate: {
threads: 0
queue: 0
active: 0
rejected: 0
largest: 0
completed: 0
}
management: {
threads: 5
queue: 0
active: 1
rejected: 0
largest: 5
completed: 7511
}
refresh: {
threads: 0
queue: 0
active: 0
rejected: 0
largest: 0
completed: 0
}
}
network: {
tcp: {
active_opens: 366578
passive_opens: 32901
curr_estab: 34
in_segs: 450996588
out_segs: 379209662
retrans_segs: 35059
estab_resets: 2230
attempt_fails: 1298
in_errs: 62
out_rsts: 6939
}
}
fs: {
timestamp: 1399114654035
total: {
total_in_bytes: 21003628544
free_in_bytes: 8092241920
available_in_bytes: 7018500096
disk_reads: 4208794
disk_writes: 5227010
disk_io_op: 9435804
disk_read_size_in_bytes: 285034193920
disk_write_size_in_bytes: 518983745536
disk_io_size_in_bytes: 804017939456
}
data: [
{
path: /var/lib/elasticsearch/elasticsearch/nodes/0
mount: /
dev: /dev/vda
total_in_bytes: 21003628544
free_in_bytes: 8092241920
available_in_bytes: 7018500096
disk_reads: 4208794
disk_writes: 5227010
disk_io_op: 9435804
disk_read_size_in_bytes: 285034193920
disk_write_size_in_bytes: 518983745536
disk_io_size_in_bytes: 804017939456
}
]
}
transport: {
server_open: 13
rx_count: 0
rx_size_in_bytes: 0
tx_count: 0
tx_size_in_bytes: 0
}
http: {
current_open: 6
total_opened: 2431
}
fielddata_breaker: {
maximum_size_in_bytes: 121634816
maximum_size: 116mb
estimated_size_in_bytes: 0
estimated_size: 0b
overhead: 1.03
}
And now shard stay unavailable even after ES restart. Here's the log
[2014-05-03 07:10:18,903][INFO ][gateway ] [Mys-Tech] recovered [2] indices into cluster_state
[2014-05-03 07:10:18,905][INFO ][node ] [Mys-Tech] started
[2014-05-03 07:10:41,334][WARN ][indices.cluster ] [Mys-Tech] [cvk][0] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [cvk][0] failed recovery
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:256)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.engine.FlushNotAllowedEngineException: [cvk][0] already flushing...
at org.elasticsearch.index.engine.internal.InternalEngine.flush(InternalEngine.java:745)
at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryFinalization(InternalIndexShard.java:716)
at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:250)
at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:197)
... 3 more
[2014-05-03 07:10:44,601][WARN ][cluster.action.shard ] [Mys-Tech] [cvk][0] sending failed shard for [cvk][0], node[gknU3JzTRviIpDi4O-rc6A], [P], s[INITIALIZING], indexUUID [m0nqEEqXQu-rHc5ipn4ZPA], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[cvk][0] failed recovery]; nested: FlushNotAllowedEngineException[[cvk][0] already flushing...]; ]]
[2014-05-03 07:10:44,602][WARN ][cluster.action.shard ] [Mys-Tech] [cvk][0] received shard failed for [cvk][0], node[gknU3JzTRviIpDi4O-rc6A], [P], s[INITIALIZING], indexUUID [m0nqEEqXQu-rHc5ipn4ZPA], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[cvk][0] failed recovery]; nested: FlushNotAllowedEngineException[[cvk][0] already flushing...]; ]]
Upvotes: 5
Views: 5444
Reputation: 27517
So first of all you appear to be running Elasticsearch with the indexes being created on the root partition:
Filesystem Size Used Avail Use% Mounted on
/dev/vda 20G 13G 6.6G 65% /
udev 237M 12K 237M 1% /dev
tmpfs 50M 216K 49M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 246M 0 246M 0% /run/shm
Generally not the best idea if you can afford to mount another drive.
You failed during a Lucene index segment merge which is generally going to require significant free space. With disk space usage already at 65% on a very small partition of only 20G you can easily run out of space particularly since you are competing with the disk needs of all other processes at the same time. There is more detail here on managing and configuring the Elasticsearch merge policy:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-merge.html
You are probably not going to be able to reliably index and manage 9 GB of data on a 20GB partition that is also the root partition, particularly if you change the data a lot. You can try to set it up to avoid/reduce segment merges which can help with disk space but this still may not work.
Regarding why it takes up as much space as it does this is a function of how you are mapping your data, but in general Elasticsearch defaults to storing a copy of all the data in it's original form, plus all of the indexes for each individual field.
If you really, really need to fit into a 20GB system I'd take a close look at your mappings and see which fields you can either not index or not store -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-source-field.html http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html
Upvotes: 2
Reputation: 65
The problem really was in disc space. For unknown reason, ES take all free disc space. Here's what happened:
I add about 75000 documents via bulk API in index (all successfull).
Then do not touching ES at all. And monitoring disc space.
During 5 minutes all space was taken by few files in /var/lib/elasticsearch/elasticsearch/nodes/0/indeces/cvk/0/index/ The most space took file _3ya.fdt (3gig) And right before loosing shard there were files named _3ya_es090_0 with extensions like .tim .pos .doc about 400mb each. After loosing shard all those files gone.
So the obvious solution is to add disc memory.
But the new questions:
Why ES takes x10 disc space than the size of data being added???
Is there a way to know when to stop add new documents in existing shard?
will it help if we create several shards instead of one?
any other suggestion how to get maximum from current server? Server has 20gig of space. We only need to index about 9gig of data for small research.
Upvotes: 0