lana
lana

Reputation: 131

how to remove common elements if present in =.>3 data frame out of 5

following the previous problem in here, and to carry out further statistical analysis, I would like to know if could be possible to remove common peaks present in >= 3 data frames

a <- data.frame(ID = c("1", "2", "3", "4", "5"), peak = c("peak1", "peak2", "peak3", "peak4", "peak10"))
b <- data.frame(ID = c("1", "2", "3", "4"), peak = c("peak1","peak3", "peak20", "peak21"))
c <- data.frame(ID = c("1", "2", "3"), peak = c("peak1", "peak5", "peak3"))
d <- data.frame(ID = c("1", "2", "3", "4", "5", "6"),peak = c("peak1", "peak3", "peak7", "peak8", "peak11", "peak12"))
e <- data.frame(ID = c("1", "2", "3"), peak = c("peak1", "peak3",  "peak9"))

and I would like to remove common peak that is present in >= 3 data frames, with a desired output:

a <- data.frame(ID = c("1", "2", "3", "4", "5"), peak = c("peak2",  "peak4", "peak10"))
b <- data.frame(ID = c("1", "2", "3", "4"), peak = c("peak20", "peak21"))
c <- data.frame(ID = c("1", "2", "3"), peak = c( "peak5"))
d <- data.frame(ID = c("1", "2", "3", "4", "5", "6"),peak = c(  "peak7", "peak8", "peak11", "peak12"))
e <- data.frame(ID = c("1", "2", "3"), peak = c ("peak9"))

Upvotes: 2

Views: 56

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 102241

Another base R option

lst <- list(a, b, c, d, e)
v <- names(Filter(function(x) x < 3, table(unlist(sapply(lst, `[[`, "peak")))))
lapply(
  lst,
  function(x) {
    subset(x, peak %in% v)
  }
)

such that

[[1]]
  ID   peak
2  2  peak2
4  4  peak4
5  5 peak10

[[2]]
  ID   peak
3  3 peak20
4  4 peak21

[[3]]
  ID  peak
2  2 peak5

[[4]]
  ID   peak
3  3  peak7
4  4  peak8
5  5 peak11
6  6 peak12

[[5]]
  ID  peak
3  3 peak9

Upvotes: 1

Onyambu
Onyambu

Reputation: 79288

In base R you could do:

my_list <- list(a = a, b = b, c = c, d = d, e = e)
y <-table(do.call(rbind, my_list)) < 3
list2env(lapply(my_list, function(x) subset(x, y[peak])), .GlobalEnv)

Now call a or any of the dataframes.

It will be wise to maintain the results in a list. ie lapply(my_list, function(x) subset(x, y[peak])) instead of casting them to the environment

Upvotes: 3

akrun
akrun

Reputation: 887501

If the 'peak' values are unique in each dataset, bind the datasets together into a single data (bind_rows), get the count of 'peak', filter the rows where the 'n' is less than 3 and pull those 'peak' elements

library(dplyr)
to_keep <- bind_rows(a, b, c, d, e, .id = 'grp') %>%
             count(peak) %>% 
             filter(n < 3) %>% 
             pull(peak)

Now we update the objects in the global env (not recommended), use assign after subseting those elements based on the peak values from 'to_keep'

for(obj in letters[1:5]) {
      assign(obj, subset(get(obj), peak %in% to_keep))
 }

Or keep the objects in a list and subset from there

library(purrr)
lst1 <- lst(a, b, c, d, e) %>%
           map(~ .x %>%
                  filter(peak %in% to_keep))

Upvotes: 2

Related Questions