Fabrizio
Fabrizio

Reputation: 939

apply problem giving each element a different name

I need to optimize a small piece of code. The code can be simplified as following. Let's say I have two data frame, I want to obtain a "result" data frame that is a selection of data2 with some conditions. For each line I need to add an identifier that corresponds to the line of the first data frame. This identifier is added to the resulting data frame as a column called "identity".

data=data.frame(a=sample(1:100, 100, replace=TRUE),b=sample(1:100, 100, replace=TRUE) )
data2=data.frame(a=sample(1:100, 100, replace=TRUE),b=sample(1:100, 100, replace=TRUE) )

result=NULL
for(i in 1:nrow(data)){  # I loop on each row of "data"
  # if the difference between the current row and the column "a"
  # of "data2" is bigger than zero we store the values of data2
  boolvect=data[i,"a"]-data2$a>0
  ares=data2[ boolvect,]
  if(nrow(ares)>0){
    # we add an identifier for such event, the identifier is the
    # row number of "data"
    ares$identity=i
    result=rbind(result,ares)
  }
}

I tried to use apply with margin 1. The results are the same but I don't know how to properly deal with the "identity" column.

all_df=apply(data, 1, function(x, data2){
  val=as.numeric(x["a"])
  boolvect=val-data2$a>0
  return(data2[boolvect,])
  
}, data2=data2)

result2=do.call(rbind, all_df)

Any help please?

Upvotes: 0

Views: 25

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389047

To get the identity column we need to iterate over the index of data.

You can do this using lapply or Map.

result1 <- do.call(rbind, lapply(seq_along(data$a), function(i) {
  boolvect= data$a[i] - data2$a > 0
  if(any(boolvect)) transform(data2[boolvect, ], identity = i)
}))

With Map :

result2 <- do.call(rbind, Map(function(x, y) {
  boolvect = x - data2$a > 0
  if(any(boolvect)) transform(data2[boolvect, ], identity = y)
}, data$a, 1:nrow(data)))

Upvotes: 1

DataJack
DataJack

Reputation: 415

I would use lapply instead of apply and feed in the index of each row for the lapply to iterate over. It's the only way for an apply function to "know what row it's on".

all_df=lapply(1:nrow(data), function(x, data, data2){
  boolvect=data[x,"a"]-data2$a>0
  ares=data2[ boolvect,]
  if(nrow(ares)>0){
    ares$identity=x
  }
  return(ares)
}, data =data,data2=data2)

result2=dplyr::bind_rows(all_df)

Upvotes: 1

Related Questions