Reputation: 21
I am trying to process a large CSV file of approximately 1 million records and after reading the rows (line/line or in chunks), I need to push this to camel-flatpack
to create a map with field names and their values.
My requirement is to feed all the CSV records to a flatpack config and generate a java.util.map out of it.
There have been several posts on stackoverflow to resolve this by splitter but my process works fast till almost 35000 records but thereafter it slows down.
I tried even adding a throttler, it still doesnt work. I get a GC Out Of Memory Error. I even shot up my JAVA_MIN_MEM
, JAVA_MAX_MEM
, JAVA_PERM_MEM
, JAVA_MAX_PERM_MEM
but the result is the same. Hawtio console shows that JAVA_HEAP_MEMORY
after about 5-6 mins is more than 95%.
Here is my code snippet:
<route id="poller-route">
<from uri="file://temp/output?noop=true&maxMessagesPerPoll=10&delay=5000"/>
<split streaming="true" stopOnException="false">
<tokenize token="\n" />
<to uri="flatpack:delim:flatpackConfig/flatPackConfig.pzmap.xml?ignoreFirstRecord=false"/>
</split>
</route>
<route id="output-route">
<from uri="flatpack:delim:flatpackConfig/flatPackConfig.pzmap.xml?ignoreFirstRecord=false"/>
<convertBodyTo type="java.util.Map"/>
<to uri="mock:result"/>
</route>
Upvotes: 2
Views: 716
Reputation: 592
One potential problem is that when you create hash maps and continuously add data to it, it needs to recreate the hash. For example, if i have hash of size 3, and input 0,1,2,3 into it, assuming my hash function is mod 3, three would be assigned to the zero slot thus creating overflow, so I would either need to store overflows or recreate a new hash.
I'm sure that this is how java implements its hashmap, but you could try initializing your hashmap's initial capacity to how many records there are.
Upvotes: 0