Reputation: 529
I have a .csv
file that is 25Gb
total size. I am attempting to read it in (line by line for now) however I keep running into an OutOfMemoryError: Java heap space
and I can't figure out why. After googling around for a while, I have come up with the following code
from("file:/home/justin/data/?fileName=in.csv&noop=true")//.streamCaching()
.split().tokenize("\n", 10000000).streaming()
.unmarshal(csv)
.process(new CsvParserProcess())
.marshal(csv)
.to("file:/home/justin/data/?fileName=out.csv").log("Finished Transformation").end();
after 5 seconds of running is when I run into the OutOfMemoryError
my intuiton would tell me "Oh when you reach near complete memory saturation, flush out old unused contents" however I am unsure of how to do this in the context of ApacheCamel (or really manually in java for that matter I've been migrating from C)
My other solution was a very expensive brute force option of just piping (?) the file into a stream one line at a time from camel's stream endpoint, which works maybe? I just haven't wanted to sit around and wait for it to finish.
from("stream:file?fileName=/home/justin/data/in.csv")
.streamCaching().split().tokenize("\n")
.unmarshal(csv)
.process(new CsvParserProcess())
.marshal(csv)
.to("file:/home/justin/data/?fileName=out.csv&fileExist=Append").log("done").end();
Does anyone have any ideas of how I can avoid the MemoryError?
Edit: I forgot that my "improved" code had .streaming()
after I tokenized the file. It still however results in the same error :(
Upvotes: 2
Views: 2964
Reputation: 529
Maybe before I ripped out my hair (and went to places on the internet I can never unsee) I should've maybe done a little research on ockham's razor.... It turns out that I can not count as well as I originally thought and the buffer I was creating of size 10000000 should have actually been 1000000....
Upvotes: 1