Sydney
Sydney

Reputation: 11

ordering responses based on individual dates

I have time series data in my data frame that is organized by subject ID and response date, where each subject responded for 7 days but starting and ending on different days. I need to come up with a variable for response day that indexes by each subjects' first day. See example:

id  response
101 11/2/2017
101 11/2/2017
101 11/3/2017
101 11/3/2017
101 11/3/2017
101 11/3/2017
102 12/14/2017
102 12/15/2017

and would like to create the column "day"

id  response    day
101 11/2/2017   1
101 11/2/2017   1
101 11/3/2017   2
101 11/3/2017   2
101 11/3/2017   2
101 11/3/2017   2
102 12/14/2017  1
102 12/15/2017  2

I've been trying to do this in dplyr and can't find the right code. Thank you!

Upvotes: 1

Views: 70

Answers (4)

Uwe
Uwe

Reputation: 42564

If the rows are already ordered by id and response, data.table's rleid() function can be used:

library(data.table)
setDT(DF)[, day := rleid(response), by = id][]
    id   response day
1: 101  11/2/2017   1
2: 101  11/2/2017   1
3: 101  11/3/2017   2
4: 101  11/3/2017   2
5: 101  11/3/2017   2
6: 101  11/3/2017   2
7: 102 12/14/2017   1
8: 102 12/15/2017   2

The correct row order is important for rleid() to return the expected result.

If the rows are not already ordered, the data string in the response column needs to be coerced to class Date first. Then it can be used to order the rows.

For instance, with an unordered dataset DF2

library(data.table)
library(lubridate)
set.seed(123L)
DF2 <- setDT(DF)[sample.int(.N)]
DF2
    id   response
1: 101  11/3/2017
2: 101  11/3/2017
3: 102 12/15/2017
4: 101  11/3/2017
5: 101  11/3/2017
6: 101  11/2/2017
7: 101  11/2/2017
8: 102 12/14/2017
DF2[, response := mdy(response)][order(response), day := rleid(response), by = id][]
    id   response day
1: 101 2017-11-03   2
2: 101 2017-11-03   2
3: 102 2017-12-15   2
4: 101 2017-11-03   2
5: 101 2017-11-03   2
6: 101 2017-11-02   1
7: 101 2017-11-02   1
8: 102 2017-12-14   1

The original row order of DF2 has not been changed, but the days are numbered as requested. This is difficult to see unless printed in the right order:

DF2[order(id, response)]
    id   response day
1: 101 2017-11-02   1
2: 101 2017-11-02   1
3: 101 2017-11-03   2
4: 101 2017-11-03   2
5: 101 2017-11-03   2
6: 101 2017-11-03   2
7: 102 2017-12-14   1
8: 102 2017-12-15   2

Data

library(data.table)
DF <- fread(
  "id  response
  101 11/2/2017
  101 11/2/2017
  101 11/3/2017
  101 11/3/2017
  101 11/3/2017
  101 11/3/2017
  102 12/14/2017
  102 12/15/2017"
)

Upvotes: 1

MKR
MKR

Reputation: 20095

One solution can be by simply use group_by and difference from 1st record of a group. Since the response is of type Date, we need to add to evaluate same day response as 1. The example.

df <- read.table(text = "id  response
101 '11/2/2017'
101 '11/2/2017'
101 '11/3/2017'
101 '11/3/2017'
101 '11/3/2017'
101 '11/3/2017'
102 '12/14/2017'
102 '12/15/2017'", header = T, stringsAsFactors = F)

df$response <- as.Date(df$response, format = "%m/%d/%Y")

library(dplyr)

df %>% group_by(id) %>%
  arrange(id, response) %>%
  mutate(day = response - first(response)+1)

     id response   day   
  <int> <date>     <time>
1   101 2017-11-02 1     
2   101 2017-11-02 1     
3   101 2017-11-03 2     
4   101 2017-11-03 2     
5   101 2017-11-03 2     
6   101 2017-11-03 2     
7   102 2017-12-14 1     
8   102 2017-12-15 2  

Upvotes: 4

Onyambu
Onyambu

Reputation: 79288

We can group_by id then just take the response as a factor and convert to numeric.

dat%>%group_by(id)%>%mutate(day=as.numeric(factor(response))
# A tibble: 8 x 3
# Groups:   id [2]
     id   response   day
  <int>      <chr> <dbl>
1   101  11/2/2017     1
2   101  11/2/2017     1
3   101  11/3/2017     2
4   101  11/3/2017     2
5   101  11/3/2017     2
6   101  11/3/2017     2
7   102 12/14/2017     1
8   102 12/15/2017     2

if the dates are not arranged then you can add the levels to the factor: dat%>%group_by(id)%>%mutate(day=as.numeric(factor(response,unique(response))))

Upvotes: 1

www
www

Reputation: 39154

We can use dense_rank from . The mutate and arrange calls here are just to make sure the date are in the right order. You can ignore that if you are certain the date are in the right order.

library(dplyr)
library(lubridate)

dat2 <- dat %>%
  mutate(response = mdy(response)) %>%
  arrange(id, response) %>%
  group_by(id) %>%
  mutate(day = dense_rank(response)) %>%
  ungroup()
dat2
# # A tibble: 8 x 3
#      id response     day
#   <int> <date>     <int>
# 1   101 2017-11-02     1
# 2   101 2017-11-02     1
# 3   101 2017-11-03     2
# 4   101 2017-11-03     2
# 5   101 2017-11-03     2
# 6   101 2017-11-03     2
# 7   102 2017-12-14     1
# 8   102 2017-12-15     2

DATA

dat <- read.table(text = "id  response
101 '11/2/2017'
                  101 '11/2/2017'
                  101 '11/3/2017'
                  101 '11/3/2017'
                  101 '11/3/2017'
                  101 '11/3/2017'
                  102 '12/14/2017'
                  102 '12/15/2017'",
                  header = TRUE, stringsAsFactors = FALSE)

Upvotes: 2

Related Questions