Stevestingray
Stevestingray

Reputation: 453

how to keep only rows that have highest value in certain column in R

I have a dataframe that looks like this:

library(tidyverse)

df <- tribble (
  ~Species, ~North, ~South, ~East, ~West,
  "a", 4, 3, 2, 3,
  "b", 2, 3, 4, 5, 
  "C", 2, 3, 3, 3,
  "D", 3, 2, 2, 2
)

I want to filter for species that where the highest value is e.g. North. In this case, species A and D would be selected. Expected output would be a df with only species A and D in it. I used a workaround like this:

df %>%
group_by(species) %>%
mutate(rowmean = mean(North:West) %>%
filter(North > rowmean) %>%
ungroup() %>%
select(!rowmean)

which seems like a lot of code for a simple task! I cant however find a way to do this more codefriendly. Is there a (preferably tidyverse) way to perform this task in a more clean way?

Kind regards

Upvotes: 1

Views: 2014

Answers (2)

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

# base
df[df$North > rowMeans(df[-1]), ]  
# A tibble: 2 x 5
  Species North South  East  West
  <chr>   <dbl> <dbl> <dbl> <dbl>
1 a           4     3     2     3
2 D           3     2     2     2

Upvotes: 2

akrun
akrun

Reputation: 887881

An easier approach is with max.col in base R. Select the columns that are numeric. Get the column index of each row where the value is max. Check if that is equal to 1 i.e. the first column (as we selected only from 2nd column onwards) and subset the rows

subset(df, max.col(df[-1], 'first') == 1)
# A tibble: 2 x 5
#  Species North South  East  West
#  <chr>   <dbl> <dbl> <dbl> <dbl>
#1 a           4     3     2     3
#2 D           3     2     2     2

If it is based on the rowwise mean

subset(df, North > rowMeans(df[-1]))

Or if we prefer to use dplyr

library(dplyr)
df %>%
   filter(max.col(cur_data()[-1], 'first') == 1)

Similarly if it based on the rowwise mean

df %>% 
    filter(North > rowMeans(cur_data()[-1]))

Upvotes: 3

Related Questions