Reputation: 453
I have a dataframe that looks like this:
library(tidyverse)
df <- tribble (
~Species, ~North, ~South, ~East, ~West,
"a", 4, 3, 2, 3,
"b", 2, 3, 4, 5,
"C", 2, 3, 3, 3,
"D", 3, 2, 2, 2
)
I want to filter for species that where the highest value is e.g. North. In this case, species A and D would be selected. Expected output would be a df with only species A and D in it. I used a workaround like this:
df %>%
group_by(species) %>%
mutate(rowmean = mean(North:West) %>%
filter(North > rowmean) %>%
ungroup() %>%
select(!rowmean)
which seems like a lot of code for a simple task! I cant however find a way to do this more codefriendly. Is there a (preferably tidyverse) way to perform this task in a more clean way?
Kind regards
Upvotes: 1
Views: 2014
Reputation: 8880
# base
df[df$North > rowMeans(df[-1]), ]
# A tibble: 2 x 5
Species North South East West
<chr> <dbl> <dbl> <dbl> <dbl>
1 a 4 3 2 3
2 D 3 2 2 2
Upvotes: 2
Reputation: 887881
An easier approach is with max.col
in base R
. Select the columns that are numeric. Get the column index of each row where the value is max
. Check if that is equal to 1 i.e. the first column (as we selected only from 2nd column onwards) and subset
the rows
subset(df, max.col(df[-1], 'first') == 1)
# A tibble: 2 x 5
# Species North South East West
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 a 4 3 2 3
#2 D 3 2 2 2
If it is based on the rowwise mean
subset(df, North > rowMeans(df[-1]))
Or if we prefer to use dplyr
library(dplyr)
df %>%
filter(max.col(cur_data()[-1], 'first') == 1)
Similarly if it based on the rowwise mean
df %>%
filter(North > rowMeans(cur_data()[-1]))
Upvotes: 3