j-money
j-money

Reputation: 529

Apache Camel out of memory exception

I have a .csv file that is 25Gb total size. I am attempting to read it in (line by line for now) however I keep running into an OutOfMemoryError: Java heap space and I can't figure out why. After googling around for a while, I have come up with the following code

from("file:/home/justin/data/?fileName=in.csv&noop=true")//.streamCaching()
    .split().tokenize("\n", 10000000).streaming()
    .unmarshal(csv)
    .process(new CsvParserProcess())
    .marshal(csv)
    .to("file:/home/justin/data/?fileName=out.csv").log("Finished Transformation").end();

after 5 seconds of running is when I run into the OutOfMemoryError

my intuiton would tell me "Oh when you reach near complete memory saturation, flush out old unused contents" however I am unsure of how to do this in the context of ApacheCamel (or really manually in java for that matter I've been migrating from C)

My other solution was a very expensive brute force option of just piping (?) the file into a stream one line at a time from camel's stream endpoint, which works maybe? I just haven't wanted to sit around and wait for it to finish.

from("stream:file?fileName=/home/justin/data/in.csv")
    .streamCaching().split().tokenize("\n")
    .unmarshal(csv)
    .process(new CsvParserProcess())
    .marshal(csv)
    .to("file:/home/justin/data/?fileName=out.csv&fileExist=Append").log("done").end();

Does anyone have any ideas of how I can avoid the MemoryError?

Edit: I forgot that my "improved" code had .streaming() after I tokenized the file. It still however results in the same error :(

Upvotes: 2

Views: 2964

Answers (1)

j-money
j-money

Reputation: 529

Maybe before I ripped out my hair (and went to places on the internet I can never unsee) I should've maybe done a little research on ockham's razor.... It turns out that I can not count as well as I originally thought and the buffer I was creating of size 10000000 should have actually been 1000000....

Upvotes: 1

Related Questions