Xaviermoros
Xaviermoros

Reputation: 147

Subset a data frame based on year column with the closest value below and farthest value above another year column

Subset dataframe based on PipeYear with the closest value below PropertyYearBuilt and farthest value above the PropertyYearBuilt using the following R code:

df <- read.table(text="
PipeID  PricePipe   PipeYear PropertyYearBuilt  Distance_to_property
    a       500         2010     2013               1.5
    b       600         2007     2008               2.5
    c       700         2009     2008               3.0
    d       800         1998     2000               4.2
    e       900         2003     2000               4.5
    f       200         2014     2013               5.0
    g       100         2011     2013               5.5
    h       850         2018     2008               7.0", header = TRUE)

Thanks!

Upvotes: 1

Views: 44

Answers (1)

Andy Baxter
Andy Baxter

Reputation: 7626

Similar answer to one I posted here (if you have latest dplyr updates), but this time the furthest above is just the max grouped by PropertyID:

library(tidyverse)

df <- read.table(text="
PipeID  PricePipe   PipeYear PropertyYearBuilt  Distance_to_property
    a       500         2010     2013               1.5
    b       600         2007     2008               2.5
    c       700         2009     2008               3.0
    d       800         1998     2000               4.2
    e       900         2003     2000               4.5
    f       200         2014     2013               5.0
    g       100         2011     2013               5.5
    h       850         2018     2008               7.0", header = TRUE) |> 
  mutate(PropertyID = as.numeric(as.factor(PropertyYearBuilt)))

bind_rows(
  df |>
    select(PropertyYearBuilt, PropertyID) |>
    unique() |>
    left_join(
      df |> select(-PropertyYearBuilt),
      join_by(PropertyID, closest(PropertyYearBuilt >= PipeYear))
    ),
df |> 
  group_by(PropertyYearBuilt) |> 
  filter(PipeYear == max(PipeYear))
) |> 
  arrange(PropertyID, PipeYear)
#>   PropertyYearBuilt PropertyID PipeID PricePipe PipeYear Distance_to_property
#> 1              2000          1      d       800     1998                  4.2
#> 2              2000          1      e       900     2003                  4.5
#> 3              2008          2      b       600     2007                  2.5
#> 4              2008          2      h       850     2018                  7.0
#> 5              2013          3      g       100     2011                  5.5
#> 6              2013          3      f       200     2014                  5.0

Upvotes: 1

Related Questions