Lisa
Lisa

Reputation: 81

Filter a data frame groupwise in R using dplyr

I have a data frame (precipitation) with columns associating different grouping variables to each sample (point in a time series):

    Date                Precipitation  Year     Month       Season
    <S3: POSIXct>       <dbl>          <dbl>    <ord>       <fctr>
1   1970-12-31 18:00:00 0.1900503      1970     December    Autumn
2   1971-01-01 18:00:00 0.4749126      1971     January     Winter
3   1971-01-02 18:00:00 6.1823234      1971     January     Winter
4   1971-01-03 18:00:00 2.7953697      1971     January     Winter
5   1971-01-04 18:00:00 2.6522014      1971     January     Winter
6   1971-01-05 18:00:00 8.7417027      1971     January     Winter

I would like to filter this data frame groupwise based on multiple thresholds, computed groupwise. The thresholds are summarized in a data frame generated as following:

percentile <- groupwisePercentile(Precipitation ~ Season, data = precipitation, tau = 0.9)
percentile

Season  n      tau    Percentile
<fctr>  <int>  <dbl>  <dbl>
Autumn  4509   0.9    5.19  
Spring  4520   0.9    3.47  
Summer  4508   0.9    6.01  
Winter  4513   0.9    4.32  

I don't know how to refer to the values in this data frame to filter the data frame precipitation groupwise, e.g using group_by followed by filter in dplyr. For now I am using a very inelegant method that is also quite laborious when it comes to groupings with more levels: I concatenate subsets of the tables that were filtered entering manually the value from the data frame percentile, like this:

filtered_winter <- precipitation %>%
  filter(Season == "Winter") %>%
  filter(Precipitation >= 4.32)

That for each group, then I bind them:

events <- rbind(filtered_winter,filtered_spring,filtered_summer,filtered_autumn)

How could I generate the same table, i.e filtered by group with a different threshold for each group, using an elegant method?

Upvotes: 2

Views: 239

Answers (1)

Ben
Ben

Reputation: 30474

You could try to use the fuzzy_join package.

Taking your groupwisePercentile result and data, you can link Season in both data frames, and join where Precipitation is >= to Percentile in the summary threshold data.

library(rcompanion)
library(fuzzyjoin)

summary_df <- groupwisePercentile(Precipitation ~ Season, data = precipitation, tau = 0.9)

fuzzy_inner_join(precipitation, 
                summary_df,
                by = c("Season" = "Season",
                       "Precipitation" = "Percentile"),
                match_fun = list(`==`, `>=`))

Upvotes: 1

Related Questions