pk_22
pk_22

Reputation: 390

Faster R code if less number of lines

I have a question about R code. Is it true that if you write a statement over several lines of code, when it is also possible to do it in one line, the code is faster? So, lesser lines means faster execution?

Example:

fileName = paste(directory, "fileTest.csv", sep="")
vars = read.csv(fileName, header=F)
vars = as.matrix(vars)

or

vars = as.matrix(read.csv(paste(directory, "fileTest.csv", sep=""), header=F))

I can imagine that only once it does not matter, but if this happens a lot in your code?

Upvotes: 1

Views: 194

Answers (1)

mindlessgreen
mindlessgreen

Reputation: 12132

Let's compare three functions: a function with 3 lines, a function with one line and a function that uses pipes.

library(microbenchmark)
library(dplyr)
library(ggplot2)

directory <- getwd()
mat <- matrix(rnorm(n=20000),nrow=200)
write.table(mat,"matrix.txt",sep="\t")

# 3-line code
fn1 <- function()
{
  fileName = paste0(directory,"/matrix.txt")
  vars = read.delim(fileName,header=T)
  as.matrix(vars)
}

# 1-line code
fn2 <- function()
{
  as.matrix(read.delim(paste0(directory,"/matrix.txt"),header=T))
}

# using pipe
fn3 <- function()
{
  paste0(directory,"/matrix.txt") %>%
          read.delim(.,header=T) %>%
          as.matrix()
}

Now, run each function 1000 times and measure run times. Plot the results.

mb <- microbenchmark::microbenchmark(fn1(),fn2(),fn3(),times=1000)
ggplot2::autoplot(mb)

enter image description here

I don't think the difference in speed is significant. But there are other factors (as mentioned in the comments) like how much memory is used by creating intermediary variables, code readability etc.

In my opinion it is better to use extra lines for better readability. This makes it easier to edit/modify code later on. Sometimes, having intermediate variables can be helpful to debug. If you have a lot of things going on, it is probably a good idea to remove variables that are no longer needed.

Upvotes: 8

Related Questions