Subsetting a dataframe according to a list of vectors

Question

I have a list of vectors of characters, called l. For example:

set.seed(42)  ## for sake of reproducibility
genes <- paste("gene",1:20,sep="")
tot=data.frame(term=sample(genes,30, replace=T), num=sample(1:10, 30, replace=T), stringsAsFactors = 
FALSE)
s1<-sample(genes,2, replace=F)
s2<-sample(genes,4, replace=F)
s3<-sample(genes,3, replace=F)
s4<-sample(genes,2, replace=F)
s5<-sample(genes,2, replace=F)
s6<-sample(genes,3, replace=F)
l=list(s1,s2,s3,s4,s5,s6)

By considering tot[tot$term%in%l[[1]],], I obtain:

      term num
 1  gene17   4
 3   gene1   6
 7  gene17   2
 26  gene1   6

and I put

 df=tot[tot$term%in%l[[1]],]
 sum(df$num)

I can obtain the total values of second column, i.e. 18. For the other elements of the list I obtain, respectively: 32 13 19 17 29. This can be achieved by a for loop:

v<-vector()
for (j in 1:length(l)) {
  df=tot[tot$term%in%l[[j]],]
  v<-c(v,sum(df$num))
}

I would like to know if there is a more simple way of doing this.

akrun · Accepted Answer

It can be simplified with sapply

v2 <- sapply(l, function(j) sum(tot$num[tot$term %in% j]))

-checking with OP's loop output

identical(v, v2)
#[1] TRUE

Or a more compact way with map

library(purrr)
map_dbl(l, ~ sum(tot$num[tot$term %in% .x]))

Or with tidyverse

library(dplyr)
stack(setNames(l, seq_along(l))) %>% 
  group_by(ind) %>% 
  summarise(Sum = tot %>% 
                    filter(term %in% values) %>%
                    pull(num) %>% 
                    sum) %>%
  pull(Sum)

Subsetting a dataframe according to a list of vectors

Answers (2)

Related Questions