Reputation: 1
Environmental information: There are two Linux-machine(8U 16G 64bits) nodes in my cluster. And I enable the Ignite native persistence. The memory for offHeap is 3.2G. Ignite version is 2.6.0.
Usage information: There are three caches, and only one backup in the cluster. About 100,000 records in total. The number of re-balance thread is 1.
My question: When I execute the sql "select *" repeatly, error "out of memory" occurs. I anaylyse the dump file with MAT and find that one thread consumes so much memory. Then I anaylyse the stack of thread and find that in the ignite method org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.rebalanceIterator, a treeMap was created and cost near 7G memory! Can somebody help me to solve the out of memory problem?
JVM options:
-server
-XX:MaxDirectMemorySize=512m
-Xms10g -Xmx10g
-XX:+AlwaysPreTouch
-XX:+UseG1GC
-XX:+AggressiveOpts
-XX:+ScavengeBeforeFullGC
-XX:+DisableExplicitGC
-XX:+UseNUMA
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+PrintAdaptiveSizePolicy
-XX:+UnlockDiagnosticVMOptions
-XX:+G1PrintRegionLivenessInfo
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-XX:G1ReservePercent=15
-XX:InitiatingHeapOccupancyPercent=45
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=10M
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
|Name | Shallow Heap | Retained Heap |Context Class Loader |Is Daemon
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
org.apache.ignite.thread.IgniteThread @ 0x5a0d32f20 |sys-#86%paloma2[172.17.0.1#192.154.163.17]%| 136 | 7,234,644,440 |org.springframework.boot.loader.LaunchedURLClassLoader @ 0x5a0ba53b8|false
|- at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.file.IgniteNativeIoLib.pread(ILcom/sun/jna/Pointer;Lcom/sun/jna/NativeLong;Lcom/sun/jna/NativeLong;)Lcom/sun/jna/NativeLong; (Native Method) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.file.AlignedBuffersDirectFileIO.readIntoAlignedBuffer(Ljava/nio/ByteBuffer;J)I (AlignedBuffersDirectFileIO.java:347) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.file.AlignedBuffersDirectFileIO.read(Ljava/nio/ByteBuffer;J)I (AlignedBuffersDirectFileIO.java:215) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(JLjava/nio/ByteBuffer;Z)V (FilePageStore.java:351) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(IJLjava/nio/ByteBuffer;Z)V (FilePageStoreManager.java:328) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(IJLjava/nio/ByteBuffer;)V (FilePageStoreManager.java:312) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(IJZ)J (PageMemoryImpl.java:779) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(IJ)J (PageMemoryImpl.java:624) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(Lorg/apache/ignite/internal/processors/cache/CacheGroupContext;Lorg/apache/ignite/internal/processors/cache/GridCacheSharedContext;Lorg/apache/ignite/internal/pagemem/PageMemory;Lorg/apache/ignite/internal/processors/cache/persistence/CacheDataRowAdapter$RowData;)V (CacheDataRowAdapter.java:140)| | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(Lorg/apache/ignite/internal/processors/cache/CacheGroupContext;Lorg/apache/ignite/internal/processors/cache/persistence/CacheDataRowAdapter$RowData;)V (CacheDataRowAdapter.java:102) | | | | |
|- at org.apache.ignite.internal.processors.cache.tree.DataRow.<init>(Lorg/apache/ignite/internal/processors/cache/CacheGroupContext;IJILorg/apache/ignite/internal/processors/cache/persistence/CacheDataRowAdapter$RowData;)V (DataRow.java:54) | | | | |
|- at org.apache.ignite.internal.processors.cache.tree.CacheDataRowStore.dataRow(IIJLorg/apache/ignite/internal/processors/cache/persistence/CacheDataRowAdapter$RowData;)Lorg/apache/ignite/internal/processors/cache/persistence/CacheDataRow; (CacheDataRowStore.java:73) | | | | |
|- at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.getRow(Lorg/apache/ignite/internal/processors/cache/persistence/tree/io/BPlusIO;JILjava/lang/Object;)Lorg/apache/ignite/internal/processors/cache/persistence/CacheDataRow; (CacheDataTree.java:146) | | | | |
|- at org.apache.ignite.internal.processors.cache.tree.CacheDataTree.getRow(Lorg/apache/ignite/internal/processors/cache/persistence/tree/io/BPlusIO;JILjava/lang/Object;)Ljava/lang/Object; (CacheDataTree.java:41) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer(JLorg/apache/ignite/internal/processors/cache/persistence/tree/io/BPlusIO;II)Z (BPlusTree.java:4660) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.init(JLorg/apache/ignite/internal/processors/cache/persistence/tree/io/BPlusIO;I)V (BPlusTree.java:4562) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.access$5300(Lorg/apache/ignite/internal/processors/cache/persistence/tree/BPlusTree$ForwardCursor;JLorg/apache/ignite/internal/processors/cache/persistence/tree/io/BPlusIO;I)V (BPlusTree.java:4501) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(Ljava/lang/Object;Ljava/lang/Object;)Lorg/apache/ignite/internal/util/lang/GridCursor; (BPlusTree.java:927) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Lorg/apache/ignite/internal/util/lang/GridCursor; (BPlusTree.java:959) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(Ljava/lang/Object;Ljava/lang/Object;)Lorg/apache/ignite/internal/util/lang/GridCursor; (BPlusTree.java:950) | | | | |
|- at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.cursor()Lorg/apache/ignite/internal/util/lang/GridCursor; (IgniteCacheOffheapManagerImpl.java:1483) | | | | |
|- at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.cursor()Lorg/apache/ignite/internal/util/lang/GridCursor; (GridCacheOffheapManager.java:1555) | | | | |
|- at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.reservedIterator(ILorg/apache/ignite/internal/processors/affinity/AffinityTopologyVersion;)Lorg/apache/ignite/internal/util/lang/GridCloseableIterator; (IgniteCacheOffheapManagerImpl.java:839) | | | | |
|- at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.rebalanceIterator(Lorg/apache/ignite/internal/processors/cache/distributed/dht/preloader/IgniteDhtDemandedPartitionsMap;Lorg/apache/ignite/internal/processors/affinity/AffinityTopologyVersion;)Lorg/apache/ignite/internal/processors/cache/IgniteRebalanceIterator; (IgniteCacheOffheapManagerImpl.java:882) | | | | |
| |- <local> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager @ 0x5a0d33368 | | 64 | 1,648 | |
| |- <local> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtDemandedPartitionsMap @ 0x5a0d33408 | | 24 | 24 | |
| |- <local> org.apache.ignite.internal.processors.affinity.AffinityTopologyVersion @ 0x5a0d33420 | | 24 | 24 | |
| |- <local>java.util.TreeMap @ 0x5a0d33438 | | 48 | 7,227,513,912 | |
| |- <local> java.util.HashSet @ 0x5a0d33468 | | 16 | 64 | |
| |- <local> java.util.Collections$UnmodifiableCollection$1 @ 0x5a0d33478 | | 24 | 80 | |
| |- <local> java.lang.Integer @ 0x5ad5cd658 354 | | 16 | 16 | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Clues: the treeMap is created in the following code
/** {@inheritDoc} */
@Override public IgniteRebalanceIterator rebalanceIterator(IgniteDhtDemandedPartitionsMap parts,
final AffinityTopologyVersion topVer)
throws IgniteCheckedException {
final TreeMap<Integer, GridCloseableIterator<CacheDataRow>> iterators = new TreeMap<>();
Set<Integer> missing = new HashSet<>();
for (Integer p : parts.fullSet()) {
GridCloseableIterator<CacheDataRow> partIter = reservedIterator(p, topVer);
if (partIter == null) {
missing.add(p);
continue;
}
iterators.put(p, partIter);
}
IgniteHistoricalIterator historicalIterator = historicalIterator(parts.historicalMap(), missing);
IgniteRebalanceIterator iter = new IgniteRebalanceIteratorImpl(iterators, historicalIterator);
for (Integer p : missing)
iter.setPartitionMissing(p);
return iter;
}
Upvotes: 0
Views: 213
Reputation: 3591
Looks like there were rebalancing and select *
queries running concurrently in the cluster.
Rebalancing is a process, that moves partitions between nodes to achieve the configured backup factor. All rebalanced data passes through the heap, so it may be polluted during this process. You can read about it here: https://apacheignite.readme.io/docs/rebalancing
Rebalancing may be tracked by corresponding messages in log or by waiting on a future, returned from IgniteCache#rebalance() method.
select *
is also a pretty consuming operation, since it requires all data to be loaded to a single node.
Using lazy queries may help in this situation. You can enable this flag either in the JDBC connection string, or as SqlFieldQuery#lazy property.
Upvotes: 1