Mark
Mark

Reputation: 1769

Subsetting a dataframe according to a list of vectors

I have a list of vectors of characters, called l. For example:

set.seed(42)  ## for sake of reproducibility
genes <- paste("gene",1:20,sep="")
tot=data.frame(term=sample(genes,30, replace=T), num=sample(1:10, 30, replace=T), stringsAsFactors = 
FALSE)
s1<-sample(genes,2, replace=F)
s2<-sample(genes,4, replace=F)
s3<-sample(genes,3, replace=F)
s4<-sample(genes,2, replace=F)
s5<-sample(genes,2, replace=F)
s6<-sample(genes,3, replace=F)
l=list(s1,s2,s3,s4,s5,s6)

By considering tot[tot$term%in%l[[1]],], I obtain:

      term num
 1  gene17   4
 3   gene1   6
 7  gene17   2
 26  gene1   6

and I put

 df=tot[tot$term%in%l[[1]],]
 sum(df$num)

I can obtain the total values of second column, i.e. 18. For the other elements of the list I obtain, respectively: 32 13 19 17 29. This can be achieved by a for loop:

v<-vector()
for (j in 1:length(l)) {
  df=tot[tot$term%in%l[[j]],]
  v<-c(v,sum(df$num))
}

I would like to know if there is a more simple way of doing this.

Upvotes: 1

Views: 40

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388807

Here is one tidyverse way :

library(tidyverse)

enframe(l, value = 'term') %>%
  unnest(term) %>%
  left_join(tot, by = 'term') %>%
  group_by(name) %>%
  summarise(num = sum(num, na.rm = TRUE))

#   name   num
#* <int> <int>
#1     1    18
#2     2    32
#3     3    13
#4     4    19
#5     5    17
#6     6    29

Upvotes: 1

akrun
akrun

Reputation: 886938

It can be simplified with sapply

v2 <- sapply(l, function(j) sum(tot$num[tot$term %in% j]))

-checking with OP's loop output

identical(v, v2)
#[1] TRUE

Or a more compact way with map

library(purrr)
map_dbl(l, ~ sum(tot$num[tot$term %in% .x]))

Or with tidyverse

library(dplyr)
stack(setNames(l, seq_along(l))) %>% 
  group_by(ind) %>% 
  summarise(Sum = tot %>% 
                    filter(term %in% values) %>%
                    pull(num) %>% 
                    sum) %>%
  pull(Sum)

Upvotes: 2

Related Questions