Reputation: 81
I have a data frame (precipitation
) with columns associating different grouping variables to each sample (point in a time series):
Date Precipitation Year Month Season
<S3: POSIXct> <dbl> <dbl> <ord> <fctr>
1 1970-12-31 18:00:00 0.1900503 1970 December Autumn
2 1971-01-01 18:00:00 0.4749126 1971 January Winter
3 1971-01-02 18:00:00 6.1823234 1971 January Winter
4 1971-01-03 18:00:00 2.7953697 1971 January Winter
5 1971-01-04 18:00:00 2.6522014 1971 January Winter
6 1971-01-05 18:00:00 8.7417027 1971 January Winter
I would like to filter this data frame groupwise based on multiple thresholds, computed groupwise. The thresholds are summarized in a data frame generated as following:
percentile <- groupwisePercentile(Precipitation ~ Season, data = precipitation, tau = 0.9)
percentile
Season n tau Percentile
<fctr> <int> <dbl> <dbl>
Autumn 4509 0.9 5.19
Spring 4520 0.9 3.47
Summer 4508 0.9 6.01
Winter 4513 0.9 4.32
I don't know how to refer to the values in this data frame to filter the data frame precipitation
groupwise, e.g using group_by
followed by filter
in dplyr. For now I am using a very inelegant method that is also quite laborious when it comes to groupings with more levels: I concatenate subsets of the tables that were filtered entering manually the value from the data frame percentile
, like this:
filtered_winter <- precipitation %>%
filter(Season == "Winter") %>%
filter(Precipitation >= 4.32)
That for each group, then I bind them:
events <- rbind(filtered_winter,filtered_spring,filtered_summer,filtered_autumn)
How could I generate the same table, i.e filtered by group with a different threshold for each group, using an elegant method?
Upvotes: 2
Views: 239
Reputation: 30474
You could try to use the fuzzy_join
package.
Taking your groupwisePercentile
result and data, you can link Season
in both data frames, and join where Precipitation
is >=
to Percentile
in the summary threshold data.
library(rcompanion)
library(fuzzyjoin)
summary_df <- groupwisePercentile(Precipitation ~ Season, data = precipitation, tau = 0.9)
fuzzy_inner_join(precipitation,
summary_df,
by = c("Season" = "Season",
"Precipitation" = "Percentile"),
match_fun = list(`==`, `>=`))
Upvotes: 1