Reputation: 390
I have a question about R code. Is it true that if you write a statement over several lines of code, when it is also possible to do it in one line, the code is faster? So, lesser lines means faster execution?
Example:
fileName = paste(directory, "fileTest.csv", sep="")
vars = read.csv(fileName, header=F)
vars = as.matrix(vars)
or
vars = as.matrix(read.csv(paste(directory, "fileTest.csv", sep=""), header=F))
I can imagine that only once it does not matter, but if this happens a lot in your code?
Upvotes: 1
Views: 194
Reputation: 12132
Let's compare three functions: a function with 3 lines, a function with one line and a function that uses pipes.
library(microbenchmark)
library(dplyr)
library(ggplot2)
directory <- getwd()
mat <- matrix(rnorm(n=20000),nrow=200)
write.table(mat,"matrix.txt",sep="\t")
# 3-line code
fn1 <- function()
{
fileName = paste0(directory,"/matrix.txt")
vars = read.delim(fileName,header=T)
as.matrix(vars)
}
# 1-line code
fn2 <- function()
{
as.matrix(read.delim(paste0(directory,"/matrix.txt"),header=T))
}
# using pipe
fn3 <- function()
{
paste0(directory,"/matrix.txt") %>%
read.delim(.,header=T) %>%
as.matrix()
}
Now, run each function 1000 times and measure run times. Plot the results.
mb <- microbenchmark::microbenchmark(fn1(),fn2(),fn3(),times=1000)
ggplot2::autoplot(mb)
I don't think the difference in speed is significant. But there are other factors (as mentioned in the comments) like how much memory is used by creating intermediary variables, code readability etc.
In my opinion it is better to use extra lines for better readability. This makes it easier to edit/modify code later on. Sometimes, having intermediate variables can be helpful to debug. If you have a lot of things going on, it is probably a good idea to remove variables that are no longer needed.
Upvotes: 8