Reputation: 815
I have two dataframes, one (df1) contains the minimum 'flow' value for specific events observed at different sites. The second dataframe (df2) contains the complete flow rime series.
df1 <- data.frame(Event = as.character(seq(1,9,1)),
Site_ID = rep(c("a","b","c"),each=3),
Flow = rnorm(9,2,1))
df2 <- data.frame(Site_ID = rep(c("a","b","c"),each=20),
Flow = rnorm(60,2,1))
I'd like to use dplyr to go through each 'Flow' value in df1, and calculate its percentile value relative to the complete time series of the corresponding site in df2 using ecdf. Any help would be much appreciated. Thanks.
Upvotes: 3
Views: 57
Reputation: 101335
You can try the code below
df1 %>%
mutate(
Quantile =
ecdf(df2$Flow[df2$Site_ID == first(Site_ID)])(Flow),
.by = Site_ID
)
Upvotes: 2
Reputation: 173803
Here's a method using group_split
and map2
:
library(tidyverse)
df1 %>%
mutate(quantile = map2(group_split(df1, Site_ID),
group_split(df2, Site_ID),
~ecdf(.y$Flow)(.x$Flow)) |> unlist())
#> Event Site_ID Flow quantile
#> 1 1 a 2.004936 0.55
#> 2 2 a 2.072371 0.55
#> 3 3 a 1.767891 0.45
#> 4 4 b 2.686329 0.70
#> 5 5 b 1.517909 0.30
#> 6 6 b 1.574355 0.35
#> 7 7 c 1.606521 0.30
#> 8 8 c 1.680232 0.30
#> 9 9 c 1.604164 0.30
Upvotes: 2