Madisonel
Madisonel

Reputation: 105

How to use lapply to subset list of dataframes based on dfs from a separate list

Im starting with 2 lists (list_a and list_b), each with elements that are class of df. My objective is to create a new list holding a new df. The new df will contain all the rows from list_a[[1]] that match rows from list_b[[1]], and so forth. I'm able to successfully apply the code manually but I get hit with an error message when trying to use lapply.

Reproducible example: 2 lists, each with 2 elements of class df

List of df_a to use for this example

df_a1 <- data.frame(X = c(17,17,18,18), Y=c(105,106,108,109), 
Z=c(3,4,4,6))
df_a2 <- data.frame(X = c(17,17,18,18), Y=c(105,106,108,109), 
Z=c(5,5,4,5))
list_a <- list(df_a1,df_a2)
df_a_list_names<-c("control", "variable")
names(list_a)<-gsub("\\.swc$", "",df_a_list_names)

df_b1 <- data.frame(X= c(17,17,17,18), Y = c(105,106,107,105), 
Z=c(3,4,6,7), I=c(50,50,50,50))
df_b2 <- data.frame(X = c(17,17,17,17), Y = c(105,106,107,108), 
Z=c(5,5,6,7), I=c(75,75,75,75))
list_b <- list(df_b1,df_b2)
df_b_list_names<-c("control", "variable")
names(list_b)<-gsub("\\.txt$", "",df_b_list_names)

code that works when applied manually

list_a[[1]]->fobA  
list_b[[1]]->fobB

new.df<-fobB%>%semi_join(fobA,by="X")%>%
semi_join(fobA,by="Y")%>%
semi_join(fobA,by="Z")
arrange(new.df, Z)->final.df

Results from running manual without lapply

data.frame':2 obs. of 4 variables:

$ X: num 17 17

$ Y: num 105 106

$ Z: num 3 4

$ I: num 50 50

Modified above as a function

fxn3<-function(x){
new.df<-list_b%>%semi_join(list_a,by="X")%>%
semi_join(list_a,by="Y")%>%
semi_join(list_a,by="Z")
arrange(new.df, Z)->final.df
return(final.df)
}

Here I tried using lapply with custom function

lapply(list_a, "fxn3")->fob.final.listHere is the error message 

I received the below error message

Error in UseMethod("semi_join") : no applicable method for

semi_join applied to an object of class "list"

Upvotes: 1

Views: 402

Answers (3)

jay.sf
jay.sf

Reputation: 73692

The lapply solution requested by OP would look like this.

lapply(1:2, function(x) merge(list_b[[x]], list_a[[x]]))
# [[1]]
#    X   Y Z  I
# 1 17 105 3 50
# 2 17 106 4 50
# 
# [[2]]
#    X   Y Z  I
# 1 17 105 5 75
# 2 17 106 5 75

Upvotes: 0

akrun
akrun

Reputation: 887881

Here, we need to do the join on the corresponding datasets in each lists, so we can use map2

library(tidyverse)
map2(list_b, list_a, semi_join)
#$control
#   X   Y Z  I
#1 17 105 3 50
#2 17 106 4 50

#$variable
#   X   Y Z  I
#1 17 105 5 75
#2 17 106 5 75

NOTE: Here, we first showed the map2 option


In base R, we can use Map

Map(merge, list_b, list_a)

Upvotes: 1

Humpelstielzchen
Humpelstielzchen

Reputation: 6441

You don't have to do semi_join() three times, you can do it like that in one step:

library(tidyverse)
map2(.x = list_b, .y = list_a, ~  semi_join(.x, .y, by=c("X", "Y", "Z")))

Upvotes: 3

Related Questions