BigDataScientist
BigDataScientist

Reputation: 1084

How can I harness `apply/lapply/sapply` instead of a `for` loop to improve performance?

I would like to speed-up the below function (fndf) which calls another function (fn1) based on a character array.

fndf- New Function
list_s - character array - chr [1:400]
rdata_i - empty data frame (for initialization)
fn1 - another custom function
rdata2 - data frame with 3000 obs of 40 variables
mdata - data.frame
nm - character

   fndf = function(list_s, rdata2){
                rdata_i = df <- data.frame(Date=as.Date(character()),
                         File=character(), 
                         User=character(), 
                         stringsAsFactors=FALSE)
                for(i in 1:length(list_s))
                {
                    rdata = fn1(list_s[i], rdata2)
                    rdata_i = rbind(rdata, rdata_i)
                }
                return(unique(rdata_i))
           }

Can we also improve performance of the function below?

    fn1 = function(nm, mdata){
                 n0 = mdata[mdata$Sign==nm,]
                 cn0 = unique(c(n0$Name))
                 repeat{
                      n1c = mdata[mdata$Mgr %in% cn0,]
                      n0 = unique(rbind(n0,n1c))
                      if(nrow(n1c)==0){
                                return(n0)
                                break
                                 } 
                      cn0= unique(c(n1c$Name))
                      }
                }

Upvotes: 0

Views: 115

Answers (2)

Konrad Rudolph
Konrad Rudolph

Reputation: 545518

It’s indeed hard to say how to best transform your loop into an *apply statement, and even harder to say whether this will speed it up. But fundamentally, the following transformation is what you’re after, and it definitely makes the function simpler and more readable. It also quite possibly corresponds to a substantial performance gain due to the loss of the repeated rbind, as noted by baptiste:

fndf = function (list_s, rdata2)
    as.data.frame(do.call(rbind, unique(lapply(list_s, fn1, rdata2))))

(Yes. That’s a single statement.)

Also note that I’m now applying the unique directly to the list rather than the data.frame. This changes the semantics – unique is specialised for data.frames – but is probably the right thing for your purposes, and it will be more efficient because it means that we don’t construct a needlessly big data.frame with redundant rows.

Upvotes: 4

Troy
Troy

Reputation: 8691

It's hard to say without your data/functions, but here is a solution with plyr and some placeholder data:

list_s<-LETTERS
rdata2<-data.frame(a=rep(LETTERS,2),b=runif(52),c=runif(52)*10)
fn1<-function(a,b=rdata2)b[rdata2$a==a,]
fn1("A")

require(plyr) # for ldply function, which takes a list and returns a dataframe
result<-ldply(1:length(list_s),function(x)fn1(list_s[x],rdata2))
head(result)

  a           b         c
1 A 0.281940237 2.7774933
2 A 0.023611392 0.6067029
3 B 0.456547803 9.4219258
4 B 0.645783746 5.3094864
5 C 0.475949523 4.8580622
6 C 0.006063407 2.5851738

Upvotes: 1

Related Questions