agermanguy
agermanguy

Reputation: 93

Replace for loop by apply function

I've read some threads about apply functions but I'm still struggling with the application. I want to generate a dummy variable in a data frame, that takes on the value 1 if the combination of two variable values exists in an observation of another data frame.

Creation of the two data frames:

df1 <- data.frame(c("A","C","E","F"),
                  c(17,24,5,8))
names(df1)[1] <- "Apple"
names(df1)[2] <- "Orange"
df1$Apple <- as.character(df1$Apple)

df1$Banana <- 0

df2 <- data.frame(c("Q","A","C","E"),
                  c(8,303,24,17))
names(df2)[1] <- "Tomato"
names(df2)[2] <- "Cucumber"
df2$Tomato <- as.character(df2$Tomato)

The only observation existing in both data frames is "C", 24 which is in row 2 of df1 and row 3 of df2. I can extract this information, using a for-loop, creating a subset with variable equivalence for the first variable and checking whether an identical value for the 2nd variable exists in the data set:

for(idx in 1:4){
df3 <- subset(df2, Tomato == df1$Apple[idx])
df1$Banana[idx] <- df1$Orange[idx] %in% df3$Cucumber
}

which leads to the desired result:

> df1
  Apple Orange Banana
1     A     17      0
2     C     24      1
3     E      5      0
4     F      8      0

However, I'm not able to achieive the same result with the apply function:

Banana <- function(){
  df3 <- subset(df2, Tomato == df1$Apple)
  df1$Orange %in% df3$Cucumber
}

apply(X = df1, MARGIN = 1, FUN = Banana)

Instead I get the following error message:

Error in FUN(newX[, i], ...) : unused argument (newX[, i])

Does anyone know, what I'm doing wrong here and how to use the function correctly?

Upvotes: 3

Views: 277

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388817

One way using apply is to iterate on df1 row-wise and check if for any row the first value equals Tomato and second value equals Cucumber and assign integer value accordingly.

df1$Banana <- as.integer(apply(df1, 1, function(x) 
                 any(x[1] == df2$Tomato & x[2] == df2$Cucumber)))
df1
#  Apple Orange Banana
#1     A     17      0
#2     C     24      1
#3     E      5      0
#4     F      8      0

Upvotes: 1

Related Questions