James White
James White

Reputation: 815

Calculating percentile of values from separate grouped dataframes

I have two dataframes, one (df1) contains the minimum 'flow' value for specific events observed at different sites. The second dataframe (df2) contains the complete flow rime series.

df1 <- data.frame(Event = as.character(seq(1,9,1)), 
              Site_ID = rep(c("a","b","c"),each=3), 
              Flow = rnorm(9,2,1))

 df2 <- data.frame(Site_ID = rep(c("a","b","c"),each=20), 
              Flow = rnorm(60,2,1))

I'd like to use dplyr to go through each 'Flow' value in df1, and calculate its percentile value relative to the complete time series of the corresponding site in df2 using ecdf. Any help would be much appreciated. Thanks.

Upvotes: 3

Views: 57

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101335

You can try the code below

df1 %>%
  mutate(
    Quantile =
      ecdf(df2$Flow[df2$Site_ID == first(Site_ID)])(Flow),
    .by = Site_ID
  )

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 173803

Here's a method using group_split and map2:

library(tidyverse)

df1 %>%
  mutate(quantile = map2(group_split(df1, Site_ID), 
                         group_split(df2, Site_ID), 
                         ~ecdf(.y$Flow)(.x$Flow)) |> unlist())
#>   Event Site_ID     Flow quantile
#> 1     1       a 2.004936     0.55
#> 2     2       a 2.072371     0.55
#> 3     3       a 1.767891     0.45
#> 4     4       b 2.686329     0.70
#> 5     5       b 1.517909     0.30
#> 6     6       b 1.574355     0.35
#> 7     7       c 1.606521     0.30
#> 8     8       c 1.680232     0.30
#> 9     9       c 1.604164     0.30

Upvotes: 2

Related Questions