Reputation: 642
I have data frame with hundreds of names and hundreds of values per name. Now I want filter some of the values based on some mathematical rule applied only to a certain subset of the data. A simplified example would filtering the max value for each name.
I can hard code it as shown below, but would love to avoid it.
library(dplyr)
##
names <- c('A', 'A', 'B', 'B')
values <- c(1,2,3,4)
df <- data.frame(names, values)
##
df%>%filter(names!='A' | values!=max(subset(df, names =='A')$values)
,names!='B' | values!=max(subset(df, names =='B')$values))
Desired ouptut:
names values
1 A 1
2 B 3
I would consider creating a loop within a dplyr filter, that calculates the max value per name and then applies both conditions within the filter, if possible.
Upvotes: 0
Views: 236
Reputation: 887311
An option in base R
subset(df, values != ave(values, names, FUN = max))
Upvotes: 0
Reputation: 145870
Filtering out max value for each name:
df %>%
group_by(names) %>%
filter(values != max(values))
# # A tibble: 2 x 2
# # Groups: names [2]
# names values
# <chr> <dbl>
# 1 A 1
# 2 B 3
Or if you mean removing the max values per name from the entire data frame, whenever they occur:
df %>%
group_by(names) %>%
slice_max(values) %>%
select(values) %>%
anti_join(df, ., by = "values")
# # A tibble: 2 x 2
# # Groups: names [2]
# names values
# <chr> <dbl>
# 1 A 1
# 2 B 3
Upvotes: 1