Reputation: 7574
I have a Windows service that is doing some maintenance work. Recently we added a job that attempts to precalculate some search result by using Lucene, and since then it started throwing OutOfMemory (OOM) exceptions.
Some details which I got from WinDbg and SOS:
0:034> !analyzeoom
Managed OOM occured after GC #176014 (Requested to allocate 2621440 bytes)
Reason: Low on memory during GC
Detail: SOH: Failed to reserve memory (16777216 bytes)
!dumpheap -stat command result (the last ones)
65fe4944 81900 34614564 System.Byte[]
65fe2938 76014 35904328 System.Int32[]
65f96064 74 39988372 System.Int64[]
65fdf9ac 3208118 150302932 System.String
00265090 363 247694656 Free
Total 9035539 objects
So there is free memory, but it gets fragmented and all portions are less that 16 MB (default allocated segment). Array of bytes, ints and int64 are hold by Lucene Cache. Cache is activated because of a query that uses sort. The Lucene cache implementation is based on WeakReferenceHashMap and thus should be cleaned by the garbage collector in case of memory starvation.
Heapstat command
0:034> !heapstat
Heap Gen0 Gen1 Gen2 LOH
Heap0 1643476 2689484 526084512 196389976
Free space: Percentage
Heap0 12 12 170262384 77432248SOH: 32% LOH: 39%
The exception dump from a log file looks like:
Quartz.Core.ErrorLogger - Job (DEFAULT.precalculate-similar-index threw an exception.
Quartz.SchedulerException: Job threw an unhandled exception. ---> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at Lucene.Net.Search.FieldCacheImpl.LongCache.CreateValue(IndexReader reader, Entry entryKey) in C:\Dev\Lucene.Net_2_9_2\src\Lucene.Net\Search\FieldCacheImpl.cs:line 685
at Lucene.Net.Search.FieldCacheImpl.Cache.Get(IndexReader reader, Entry key) in C:\Dev\Lucene.Net_2_9_2\src\Lucene.Net\Search\FieldCacheImpl.cs:line 240
at Lucene.Net.Search.FieldCacheImpl.GetLongs(IndexReader reader, String field, LongParser parser) in C:\Dev\Lucene.Net_2_9_2\src\Lucene.Net\Search\FieldCacheImpl.cs:line 639
at Lucene.Net.Search.FieldCacheImpl.LongCache.CreateValue(IndexReader reader, Entry entryKey) in C:\Dev\Lucene.Net_2_9_2\src\Lucene.Net\Search\FieldCacheImpl.cs:line 667
at Lucene.Net.Search.FieldCacheImpl.Cache.Get(IndexReader reader, Entry key) in C:\Dev\Lucene.Net_2_9_2\src\Lucene.Net\Search\FieldCacheImpl.cs:line 240
at Lucene.Net.Search.FieldCacheImpl.GetLongs(IndexReader reader, String field, LongParser parser) in C:\Dev\Lucene.Net_2_9_2\src\Lucene.Net\Search\FieldCacheImpl.cs:line 639
at Lucene.Net.Search.FieldComparator.LongComparator.SetNextReader(IndexReader reader, Int32 docBase) in C:\Dev\Lucene.Net_2_9_2\src\Lucene.Net\Search\FieldComparator.cs:line 481
The only idea I got till know is that the exception is caused by memory fragmentation. Unfortunately there is no answer to the question why memory is not compressed.
We don't pin any objects and it seems that Lucene doesn't either, although !gcroot command returns for some objets the following result:
DOMAIN(0025D260):HANDLE(Pinned):1f13ec:Root: 02393250(System.Object[]) - from !gcroot
ESP:16f2e4: sizeof(02393250) = 123436600 ( 0x75b7e38) bytes (System.Object[]) - size of the pinned arrays of objects
System: Windows Server 2008 R2 32 bit
Total commited bytes: ~950 MB
Total reserved bytes: ~ 1,666 MB (numbers are taken from Performance monitor)
The index searcher and therefore the associated index reader is closed regularly after a short batch is done. After that, a new batch is scheduled and works continues. OOM does appear after a couple hours of running. Also the exception is caught, and the service continues to run.
Upvotes: 1
Views: 3722
Reputation: 7574
There wasn't a leak, but because of high CPU and memory load the GC threw catcheable System.OutOfMemoryException (OOM) exceptions. So from one side the process continued working and from the other side the index wasn't updated.
At least I managed to somehow lower the pressure on memory and now the system is working OK.
As Simon Svensson pointed out, the GC is not collecting if readers are still open, so I decided to check how code is dealing with readers. It turned out that there were a relatively high number of unnecessary places where index was open. Opening the index in one place and sending it as a parameter, made the problem go away.
Upvotes: 1