Why loading a model takes so much time for me in R?

Question

For a personal project I need to run several machine learning algorithms against different texts in order to classify them.

I used to do this using RapidMiner but I decided to move all my development to R as I feel I have more control with it.

The issue I am seeing now (which I did not notice with RapidMiner) is that loading the models is taking a lot of time.

For example:

I have a model which checks if it the text refers to sports. The model is 37.7 MB and it takes 8:34 with my 2.2 GH i7 Mac with 4GB of RAM

The way I am calling the model is the following:

fileNameMatrix = paste(query,query1,"-matrix.Rd", sep ="")
fileNameModel= paste(query,query1,"-model.Rd", sep ="")

load(fileNameMatrix)
load(fileNameModel)

The model was generated using RTextTools

Those query variables you read are because I need to call almost 20 models and compare them against different datasets. That is why although 8 minutes is not a lot, when I read all of them its almost 3 hours just on loading which makes my task almost useless considering its an almost real time task.

Which factors should I consider to reduce loading time if reducing the size of the model is not an option?

One other thing I consider suspicious is that while the matrix file is rather small 64KB the model is still 37.7MB. Is it possible that the model file is bigger than necessary? Have anyone experienced something similar using RTextTools?

This is one of my firsts tasks using models in R so excuse me if I am doing somethings which is obviously wrong.

Thanks a lot for your time and any tip in the right direction will be much appreciated!

jclancy · Accepted Answer

Have you checked the RAM usage in your Activity Monitor? Compressed RData files are relatively tiny, but they uncompress to be massive. For instance, an n x n matrix of all 0's will take up essentially no space for any n (that may explain your small matrix size). Your loaded model might then be huge; I have some RData files that amount to maybe 200 MB but that cannot be loaded in memory in R. This could become a problem if you're running low on RAM, as your computer may attempt to use drive space to load the files.

Why loading a model takes so much time for me in R?

Answers (2)

Related Questions