Reputation: 131
following the previous problem in here, and to carry out further statistical analysis, I would like to know if could be possible to remove common peaks present in >= 3 data frames
a <- data.frame(ID = c("1", "2", "3", "4", "5"), peak = c("peak1", "peak2", "peak3", "peak4", "peak10"))
b <- data.frame(ID = c("1", "2", "3", "4"), peak = c("peak1","peak3", "peak20", "peak21"))
c <- data.frame(ID = c("1", "2", "3"), peak = c("peak1", "peak5", "peak3"))
d <- data.frame(ID = c("1", "2", "3", "4", "5", "6"),peak = c("peak1", "peak3", "peak7", "peak8", "peak11", "peak12"))
e <- data.frame(ID = c("1", "2", "3"), peak = c("peak1", "peak3", "peak9"))
and I would like to remove common peak that is present in >= 3 data frames, with a desired output:
a <- data.frame(ID = c("1", "2", "3", "4", "5"), peak = c("peak2", "peak4", "peak10"))
b <- data.frame(ID = c("1", "2", "3", "4"), peak = c("peak20", "peak21"))
c <- data.frame(ID = c("1", "2", "3"), peak = c( "peak5"))
d <- data.frame(ID = c("1", "2", "3", "4", "5", "6"),peak = c( "peak7", "peak8", "peak11", "peak12"))
e <- data.frame(ID = c("1", "2", "3"), peak = c ("peak9"))
Upvotes: 2
Views: 56
Reputation: 102241
Another base R option
lst <- list(a, b, c, d, e)
v <- names(Filter(function(x) x < 3, table(unlist(sapply(lst, `[[`, "peak")))))
lapply(
lst,
function(x) {
subset(x, peak %in% v)
}
)
such that
[[1]]
ID peak
2 2 peak2
4 4 peak4
5 5 peak10
[[2]]
ID peak
3 3 peak20
4 4 peak21
[[3]]
ID peak
2 2 peak5
[[4]]
ID peak
3 3 peak7
4 4 peak8
5 5 peak11
6 6 peak12
[[5]]
ID peak
3 3 peak9
Upvotes: 1
Reputation: 79288
In base R you could do:
my_list <- list(a = a, b = b, c = c, d = d, e = e)
y <-table(do.call(rbind, my_list)) < 3
list2env(lapply(my_list, function(x) subset(x, y[peak])), .GlobalEnv)
Now call a
or any of the dataframes.
It will be wise to maintain the results in a list. ie lapply(my_list, function(x) subset(x, y[peak]))
instead of casting them to the environment
Upvotes: 3
Reputation: 887501
If the 'peak' values are unique in each dataset, bind the datasets together into a single data (bind_rows
), get the count
of 'peak', filter
the rows where the 'n' is less than 3 and pull
those 'peak' elements
library(dplyr)
to_keep <- bind_rows(a, b, c, d, e, .id = 'grp') %>%
count(peak) %>%
filter(n < 3) %>%
pull(peak)
Now we update the objects in the global env (not recommended), use assign
after subset
ing those elements based on the peak values from 'to_keep'
for(obj in letters[1:5]) {
assign(obj, subset(get(obj), peak %in% to_keep))
}
Or keep the objects in a list
and subset from there
library(purrr)
lst1 <- lst(a, b, c, d, e) %>%
map(~ .x %>%
filter(peak %in% to_keep))
Upvotes: 2