Reputation: 3407
I have a tibble (or data frame, if you like) that is 19 columns of pure numerical data and I want to filter it down to only the rows where at least one value is above or below a threshold. I prefer a tidyverse/dplyr solution but whatever works is fine.
This is related to this question but a distinct in at least two ways that I can see:
Here are attempts I've tried:
data %>% filter(max(.) < 8)
data %>% filter(max(value) < 8)
data %>% slice(which.max(.))
Upvotes: 1
Views: 3058
Reputation: 887851
We can use base R
methods
data[Reduce(`|`, lapply(data, `>`, threshold)),]`
Upvotes: 0
Reputation: 90
Maybe there are better and more efficient ways, but these two functions should do what you need if I understood correctly. This solution assumes you have only numerical data.
# Random Data -------------------------------------------------------------
data <- as.tibble(replicate(10, runif(20)))
# Threshold to be used -----------------------------------------------------
max_treshold = 0.9
min_treshold = 0.1
# Lesser_max --------------------------------------------------------------
lesser_max = function(data, max_treshold = 0.9) {
index_max_list =
data %>%
t() %>%
as.tibble() %>%
map(max) %>%
unname()
index_max =
index_max_list < max_treshold
data[index_max,]
}
# Greater_min -------------------------------------------------------------
greater_min = function(data, min_treshold = 0.1) {
index_min_list =
data %>%
t() %>%
as.tibble() %>%
map(min) %>%
unname()
index_min =
index_min_list > min_treshold
data[index_min,]
}
# Examples ----------------------------------------------------------------
data %>%
lesser_max(max_treshold)
data %>%
greater_min(min_treshold)
Upvotes: 0
Reputation: 5138
Couple more options that should scale pretty well:
library(dplyr)
# a more dplyr-y option
iris %>%
filter_all(any_vars(. > 5))
# or taking advantage of base functions
iris %>%
filter(do.call(pmax, as.list(.))>5)
Upvotes: 2
Reputation: 11150
Here's a way which will keep rows having value above threshold. For keeping values below threshold, just reverse the inequality in any
-
data %>%
filter(apply(., 1, function(x) any(x > threshold)))
Actually, @r2evans has better answer in comments -
data %>%
filter(rowSums(. > threshold) >= 1)
Upvotes: 3