Will
Will

Reputation: 1

For loops in R and computational speed

In the past I have written R code that requires for loops inside of for loops. Generally this code is rather time consuming to execute. I have read online this is a result of how for loops in R work. I have also read that using for loops in another language inside R e.g. C++ or Java, can speed up the computational time.

Does anyone have experience with this and can point me to some simple examples that I can read?

Also can you call another language for the for loop, but still have everything inside the loop be in standard R code?

Upvotes: 0

Views: 522

Answers (2)

Flounderer
Flounderer

Reputation: 652

I do have some experience of this as I worked on a project in which it was necessary to write some loops in C in order to speed up the code.

First, it's useful to note that there is a lot of information about R's for loops on the main Stackoverflow site. For example, the question Speed up the Loop Operation in R has at least two excellent answers which I found very helpful. Also, the R Inferno as suggested by Roman Luštrik above has a lot of good advice.

Assuming that you've vectorised everything that can be vectorised, removed as much as possible from inside the loops, worried about the fact that ( is a function call, and so on, you are asking: what to do next?

(Aside: as I understand it, from asking questions on various sites, R is written in C, and almost everything you write in R is a function call at the C level. This means that if you are doing things over and over again, you should make sure that your code makes as few function calls as possible, as these can really add up, paricularly in a double for loop. That's why it's interesting that innocent-looking things like brackets are actually function calls.)

The first place you will be told to look when trying to extend R is the Writing R extensions manual. This hasn't worked out very well for me as it's not written with the casual R user in mind. Instead, I have found Matloff's book The Art of R Programming to be a lot more helpful. The link is to a pdf draft of the book; the book itself contains more detailed examples. In fact, I now see that the example is not included in the above pdf; sorry.

Anyway, it turns out that there are two ways to call C from R, called .C and .Call. Many people don't recommend using .C, but it has the advantage of being much easier to use, and the people who tend to discourage it also tend to be hard-core programmers.

There are numerous online tutorials on how to use the .C Interface with examples, for example this one from Simon Fraser University. Basically, you have to write the function you want to call in C, it has to have return type void and it has to accept pointers as arguments. I hadn't tried using C when I first started trying to learn this, and I learned what I needed to know from a book called C Programming in Easy Steps. Another good reference, which is available online for free, is the book Modeling with Data by Ben Klemens, which is a statistics textbook which uses C as the language of choice and assumes no prior knowledge. I found it very helpful for learning about pointers.

If you are trying to make examples of .C run "out of the box" then it helps to be working in a UNIX environment. I haven't been using one of those, and it is much harder to get things to work on Windows; I have a feeling that many people feel that using Windows is somehow evil and are reluctant to help those who use it, which is a pain if you are a data analyst who happens to have no other choice. Or possibly this is unfair, and people who use Windows are simply expected to be familiar with the command line.

I don't want to go into the details of getting .C to work on Windows, just in case you aren't using it. What I can say is that I know nothing about computers, but I did manage to do it, and so it can be done.

A newer alternative to using .C or .Call is the Rcpp package. Dirk Edelbuettel, one of the package authors, is very active on Stackexchange and is very likely to help you if you have questions about this package, or any other R/C interface. As recommended above by Roman Luštrik, this package is likely to be a very good option. I haven't used it myself as I have not yet been able to install it under Windows.

As for the final part of your question, asking whether you can write a for loop in a foreign language and then just use R code inside the loop, I am pretty sure, unfortuately, that there is no good way of doing this. It would be great if you could just skip over having to use R's for altogether, but I don't think it works this way. You can, however, use various R functions in the C code if you include the R.h header file. Again, it's hard to get this to work on Windows. In particular, you have to install a thing called Rtools. But once it's working, writing a small piece of C code is almost as easy as writing the corresponding R code.

Anyway, I hope some of these references are somewhat helpful. The best option to try first is to write the R code as efficiently as possible. Next, try Rcpp. If this doesn't work, or if you are doing something fairly minor, then I recommend .C. I am sure some experts will turn up with better advice, but I hope it's at least somewhat useful to have an answer from a non-programmer who has struggled with these issues.

Upvotes: 7

Glen_b
Glen_b

Reputation: 8252

If you're nesting loops in R, but haven't got enough experience of R to know when to use them and when not to, you're probably not taking full advantage of the language.

For loops in R are not always slower than other approaches, like apply - but there's one huge bugbear -

  • never grow an array inside a loop

Instead, make your arrays full-size before you loop and then fill them up.

Outside of that, there are frequently functions which do vector-type operations that are pretty fast that you can take advantage of that do sometimes speed up code, and sometimes ways of organizing your code that can have substantive speed improvements.

You can go to tools like Rccp and inline which can be a big help, but often you can do a lot just in vanilla R.

More explicit help would depend on your specific problems.

Upvotes: 5

Related Questions