Reputation: 939
I need to optimize a small piece of code. The code can be simplified as following. Let's say I have two data frame, I want to obtain a "result" data frame that is a selection of data2 with some conditions. For each line I need to add an identifier that corresponds to the line of the first data frame. This identifier is added to the resulting data frame as a column called "identity".
data=data.frame(a=sample(1:100, 100, replace=TRUE),b=sample(1:100, 100, replace=TRUE) )
data2=data.frame(a=sample(1:100, 100, replace=TRUE),b=sample(1:100, 100, replace=TRUE) )
result=NULL
for(i in 1:nrow(data)){ # I loop on each row of "data"
# if the difference between the current row and the column "a"
# of "data2" is bigger than zero we store the values of data2
boolvect=data[i,"a"]-data2$a>0
ares=data2[ boolvect,]
if(nrow(ares)>0){
# we add an identifier for such event, the identifier is the
# row number of "data"
ares$identity=i
result=rbind(result,ares)
}
}
I tried to use apply with margin 1. The results are the same but I don't know how to properly deal with the "identity" column.
all_df=apply(data, 1, function(x, data2){
val=as.numeric(x["a"])
boolvect=val-data2$a>0
return(data2[boolvect,])
}, data2=data2)
result2=do.call(rbind, all_df)
Any help please?
Upvotes: 0
Views: 25
Reputation: 389047
To get the identity column we need to iterate over the index of data
.
You can do this using lapply
or Map
.
result1 <- do.call(rbind, lapply(seq_along(data$a), function(i) {
boolvect= data$a[i] - data2$a > 0
if(any(boolvect)) transform(data2[boolvect, ], identity = i)
}))
With Map
:
result2 <- do.call(rbind, Map(function(x, y) {
boolvect = x - data2$a > 0
if(any(boolvect)) transform(data2[boolvect, ], identity = y)
}, data$a, 1:nrow(data)))
Upvotes: 1
Reputation: 415
I would use lapply instead of apply and feed in the index of each row for the lapply to iterate over. It's the only way for an apply function to "know what row it's on".
all_df=lapply(1:nrow(data), function(x, data, data2){
boolvect=data[x,"a"]-data2$a>0
ares=data2[ boolvect,]
if(nrow(ares)>0){
ares$identity=x
}
return(ares)
}, data =data,data2=data2)
result2=dplyr::bind_rows(all_df)
Upvotes: 1