Zelbinian
Zelbinian

Reputation: 3407

In R, how can I filter based on the maximum value in each row of my data?

I have a tibble (or data frame, if you like) that is 19 columns of pure numerical data and I want to filter it down to only the rows where at least one value is above or below a threshold. I prefer a tidyverse/dplyr solution but whatever works is fine.

This is related to this question but a distinct in at least two ways that I can see:

  1. I have no identifier column (besides the row number, I suppose)
  2. I need to subset based on the max across the current row being evaluated, not across a column

Here are attempts I've tried:

data %>% filter(max(.) < 8)
data %>% filter(max(value) < 8)
data %>% slice(which.max(.))

Upvotes: 1

Views: 3058

Answers (4)

akrun
akrun

Reputation: 887851

We can use base R methods

data[Reduce(`|`, lapply(data, `>`, threshold)),]`

Upvotes: 0

Maybe there are better and more efficient ways, but these two functions should do what you need if I understood correctly. This solution assumes you have only numerical data.

  • You transpose the tibble (so you obtain a numerical matrix)
  • Then you use map to get the max or min by column (which is the max/min by row in the initial dataset).
  • You obtain the row index you are looking for
  • Finally, you can filter your dataset.

# Random Data -------------------------------------------------------------

data <- as.tibble(replicate(10, runif(20)))

# Threshold to be used -----------------------------------------------------

max_treshold = 0.9
min_treshold = 0.1

# Lesser_max --------------------------------------------------------------

lesser_max = function(data, max_treshold = 0.9) {
  index_max_list =
    data %>%
    t() %>%
    as.tibble() %>%
    map(max) %>%
    unname()

  index_max =
    index_max_list < max_treshold

  data[index_max,]
}

# Greater_min -------------------------------------------------------------

greater_min = function(data, min_treshold = 0.1) {
  index_min_list =
    data %>%
    t() %>%
    as.tibble() %>%
    map(min) %>%
    unname()

  index_min =
    index_min_list > min_treshold

  data[index_min,]
}

# Examples ----------------------------------------------------------------

data %>%
  lesser_max(max_treshold)

data %>%
  greater_min(min_treshold)

Upvotes: 0

Andrew
Andrew

Reputation: 5138

Couple more options that should scale pretty well:

library(dplyr)

# a more dplyr-y option 
iris %>%
      filter_all(any_vars(. > 5))

# or taking advantage of base functions
iris %>%
      filter(do.call(pmax, as.list(.))>5)

Upvotes: 2

Shree
Shree

Reputation: 11150

Here's a way which will keep rows having value above threshold. For keeping values below threshold, just reverse the inequality in any -

data %>% 
  filter(apply(., 1, function(x) any(x > threshold)))

Actually, @r2evans has better answer in comments -

data %>%
  filter(rowSums(. > threshold) >= 1)

Upvotes: 3

Related Questions