vpatel
vpatel

Reputation: 53

How to create an indicator variable for the lowest and highest date with skipping rule in R?

I want an indicator variable that tells me if the date is the lowest or highest when its group by ID. However, I don't want it to count anything that has practice as 1. This is what the data frame looks now and what I want it to look like.

ID date practice
1 02-27-2020 1
1 04-21-2021 0
1 06-24-2022 0
2 03-21-2019 0
2 09-19-2020 0
2 01-21-2021 0

End product:

ID date practice lowest highest
1 02-27-2020 1 0 0
1 04-21-2021 0 1 0
1 06-24-2022 0 0 1
2 03-21-2019 0 1 0
2 09-19-2020 0 0 0
2 01-21-2021 0 0 1

Upvotes: 1

Views: 68

Answers (3)

langtang
langtang

Reputation: 24742

A data.table approach:

f <- function(x,p) list(1*(x==min(x[p!=1])), 1*(x==max(x[p!=1])))
setDT(df)[,date:=as.IDate(date, "%m-%d-%Y")][,c("lowest","highest"):=f(date,practice), by=ID][]

Output:

      ID       date practice lowest highest
   <int>     <IDat>    <int>  <num>   <num>
1:     1 2020-02-27        1      0       0
2:     1 2021-04-21        0      1       0
3:     1 2022-06-24        0      0       1
4:     2 2019-03-21        0      1       0
5:     2 2020-09-19        0      0       0
6:     2 2021-01-21        0      0       1

Input:

structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), date = c("02-27-2020", 
"04-21-2021", "06-24-2022", "03-21-2019", "09-19-2020", "01-21-2021"
), practice = c(1L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, -6L
), class = "data.frame")

Upvotes: 0

markyoung
markyoung

Reputation: 179

This code delivered the desired solution using the tidyverse. Note that I had to force date into a date format; by default the mm-dd-YYYY format gets read as character, which causes min(date) and max(date) to read different values as min and max.

x<-'
ID  date    practice
1   02-27-2020  1
1   04-21-2021  0
1   06-24-2022  0
2   03-21-2019  0
2   09-19-2020  0
2   01-21-2021  0'

df1 <- read.table(textConnection(x), header = TRUE)
library(tidyverse)

df1$date <- as.Date(df1$date, format = "%m-%d-%Y")

desired_result <- df1 %>%
  group_by(ID) %>%
  mutate(
    lowest = ifelse(date == min(date[practice == 0]), 1, 0),
    highest = ifelse(date == max(date[practice == 0]), 1, 0)
  )

desired_result
# A tibble: 6 × 5
# Groups:   ID [2]
     ID date       practice lowest highest
  <int> <date>        <int>  <dbl>   <dbl>
1     1 2020-02-27        1      0       0
2     1 2021-04-21        0      1       0
3     1 2022-06-24        0      0       1
4     2 2019-03-21        0      1       0
5     2 2020-09-19        0      0       0
6     2 2021-01-21        0      0       1

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76575

Here is a base R solution with ave.

x<-'
ID  date    practice
1   02-27-2020  1
1   04-21-2021  0
1   06-24-2022  0
2   03-21-2019  0
2   09-19-2020  0
2   01-21-2021  0'

df1 <- read.table(textConnection(x), header = TRUE)
df1$date <- as.Date(df1$date, "%m-%d-%Y")

y1 <- with(df1, ave(as.integer(date), ID, practice, FUN = \(x) {
  if(length(x))
    min(x) == x
  else NULL
}))
y2 <- with(df1, ave(as.integer(date), ID, practice, FUN = \(x) {
  if(length(x))
    max(x) == x
  else NULL
}))

df1$lowest <- as.integer(y1 & (df1$practice != 1))
df1$highest <- as.integer(y2 & (df1$practice != 1))
df1
#>   ID       date practice lowest highest
#> 1  1 2020-02-27        1      0       0
#> 2  1 2021-04-21        0      1       0
#> 3  1 2022-06-24        0      0       1
#> 4  2 2019-03-21        0      1       0
#> 5  2 2020-09-19        0      0       0
#> 6  2 2021-01-21        0      0       1

Created on 2022-04-25 by the reprex package (v2.0.1)

Upvotes: 0

Related Questions