Reputation: 139
I'm loading a very large data set into R from java. I have created a Java program that calls R using rJava's jri. This program has been wrapped up into an executable jar file and is being called from the terminal (linux). The data is in region of 50 columns by 13.7 million rows. R alone can handle this without a problem. However, when I run it from the Java program, I get a Java OutofMemory Heap error.
The thing is when I run it with half the rows it works, yet R should only be sending the names of each variable (50 in total) back to java regardless of how many rows there are. This is the code I'm using:
re.eval("names(data<-read.csv(file="data.csv", head=TRUE, sep=","));
My understanding is that the re.eval function, evaluates an expression in R and sends the results back to R. Is there any way for you to evaluate the function and not have the result returned to java?
I asked a similar question before, here is the link: Evaluating expressions called from Java in R. Out of Memory Error: Java Heap
Upvotes: 1
Views: 672
Reputation: 7181
Have you tried adjusting the JVM Heap size by starting the executable with options?
Like:
java -Xmx1024m -Xms1024m myJar
You can adjust the memory values, obviously, but the option -Xmx
sets the maximum heap size for the JVM and -Xms
sets the initial size.
This may help if you are processing a lot of data that you actually need to retrieve, otherwise options (as suggested by cdeszaq) where you don't get any data back would obviously be best suited for you.
Upvotes: 1
Reputation: 31300
One way to do it that would allow you to call R without having anything come back to Java would be to call R as an external process. Since it looks like that is roughly what you are doing anyway, perhaps having the OS execute the call to R, rather than the library inside of Java, would prevent the overflow.
Upvotes: 0