MsGISRocker
MsGISRocker

Reputation: 642

filter data frame based on multiple dynamic conditons, which depend on a subset of the data, eg. via applying a loop

I have data frame with hundreds of names and hundreds of values per name. Now I want filter some of the values based on some mathematical rule applied only to a certain subset of the data. A simplified example would filtering the max value for each name.

I can hard code it as shown below, but would love to avoid it.

library(dplyr)
##
names <- c('A', 'A', 'B', 'B')
values <- c(1,2,3,4)
df <- data.frame(names, values)
##
df%>%filter(names!='A' | values!=max(subset(df, names =='A')$values)
            ,names!='B' | values!=max(subset(df, names =='B')$values))

Desired ouptut:

  names values
1     A      1
2     B      3

I would consider creating a loop within a dplyr filter, that calculates the max value per name and then applies both conditions within the filter, if possible.

Upvotes: 0

Views: 236

Answers (2)

akrun
akrun

Reputation: 887311

An option in base R

subset(df, values != ave(values, names, FUN = max))

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 145870

Filtering out max value for each name:

df %>% 
  group_by(names) %>%
  filter(values != max(values))

# # A tibble: 2 x 2
# # Groups:   names [2]
#   names values
#   <chr>  <dbl>
# 1 A          1
# 2 B          3

Or if you mean removing the max values per name from the entire data frame, whenever they occur:

df %>% 
  group_by(names) %>%
  slice_max(values) %>%
  select(values) %>%
  anti_join(df, ., by = "values")

# # A tibble: 2 x 2
# # Groups:   names [2]
#   names values
#   <chr>  <dbl>
# 1 A          1
# 2 B          3

Upvotes: 1

Related Questions