Reputation: 11
I have time series data in my data frame that is organized by subject ID and response date, where each subject responded for 7 days but starting and ending on different days. I need to come up with a variable for response day that indexes by each subjects' first day. See example:
id response
101 11/2/2017
101 11/2/2017
101 11/3/2017
101 11/3/2017
101 11/3/2017
101 11/3/2017
102 12/14/2017
102 12/15/2017
and would like to create the column "day"
id response day
101 11/2/2017 1
101 11/2/2017 1
101 11/3/2017 2
101 11/3/2017 2
101 11/3/2017 2
101 11/3/2017 2
102 12/14/2017 1
102 12/15/2017 2
I've been trying to do this in dplyr and can't find the right code. Thank you!
Upvotes: 1
Views: 70
Reputation: 42564
If the rows are already ordered by id
and response
, data.table
's rleid()
function can be used:
library(data.table)
setDT(DF)[, day := rleid(response), by = id][]
id response day 1: 101 11/2/2017 1 2: 101 11/2/2017 1 3: 101 11/3/2017 2 4: 101 11/3/2017 2 5: 101 11/3/2017 2 6: 101 11/3/2017 2 7: 102 12/14/2017 1 8: 102 12/15/2017 2
The correct row order is important for rleid()
to return the expected result.
If the rows are not already ordered, the data string in the response
column needs to be coerced to class Date
first. Then it can be used to order the rows.
For instance, with an unordered dataset DF2
library(data.table)
library(lubridate)
set.seed(123L)
DF2 <- setDT(DF)[sample.int(.N)]
DF2
id response 1: 101 11/3/2017 2: 101 11/3/2017 3: 102 12/15/2017 4: 101 11/3/2017 5: 101 11/3/2017 6: 101 11/2/2017 7: 101 11/2/2017 8: 102 12/14/2017
DF2[, response := mdy(response)][order(response), day := rleid(response), by = id][]
id response day 1: 101 2017-11-03 2 2: 101 2017-11-03 2 3: 102 2017-12-15 2 4: 101 2017-11-03 2 5: 101 2017-11-03 2 6: 101 2017-11-02 1 7: 101 2017-11-02 1 8: 102 2017-12-14 1
The original row order of DF2
has not been changed, but the days are numbered as requested. This is difficult to see unless printed in the right order:
DF2[order(id, response)]
id response day 1: 101 2017-11-02 1 2: 101 2017-11-02 1 3: 101 2017-11-03 2 4: 101 2017-11-03 2 5: 101 2017-11-03 2 6: 101 2017-11-03 2 7: 102 2017-12-14 1 8: 102 2017-12-15 2
library(data.table)
DF <- fread(
"id response
101 11/2/2017
101 11/2/2017
101 11/3/2017
101 11/3/2017
101 11/3/2017
101 11/3/2017
102 12/14/2017
102 12/15/2017"
)
Upvotes: 1
Reputation: 20095
One solution can be by simply use group_by
and difference from 1st record of a group. Since the response
is of type Date
, we need to add
to evaluate same day response as 1
. The example.
df <- read.table(text = "id response
101 '11/2/2017'
101 '11/2/2017'
101 '11/3/2017'
101 '11/3/2017'
101 '11/3/2017'
101 '11/3/2017'
102 '12/14/2017'
102 '12/15/2017'", header = T, stringsAsFactors = F)
df$response <- as.Date(df$response, format = "%m/%d/%Y")
library(dplyr)
df %>% group_by(id) %>%
arrange(id, response) %>%
mutate(day = response - first(response)+1)
id response day
<int> <date> <time>
1 101 2017-11-02 1
2 101 2017-11-02 1
3 101 2017-11-03 2
4 101 2017-11-03 2
5 101 2017-11-03 2
6 101 2017-11-03 2
7 102 2017-12-14 1
8 102 2017-12-15 2
Upvotes: 4
Reputation: 79288
We can group_by
id then just take the response as a factor and convert to numeric.
dat%>%group_by(id)%>%mutate(day=as.numeric(factor(response))
# A tibble: 8 x 3
# Groups: id [2]
id response day
<int> <chr> <dbl>
1 101 11/2/2017 1
2 101 11/2/2017 1
3 101 11/3/2017 2
4 101 11/3/2017 2
5 101 11/3/2017 2
6 101 11/3/2017 2
7 102 12/14/2017 1
8 102 12/15/2017 2
if the dates are not arranged then you can add the levels to the factor:
dat%>%group_by(id)%>%mutate(day=as.numeric(factor(response,unique(response))))
Upvotes: 1
Reputation: 39154
We can use dense_rank
from dplyr. The mutate
and arrange
calls here are just to make sure the date are in the right order. You can ignore that if you are certain the date are in the right order.
library(dplyr)
library(lubridate)
dat2 <- dat %>%
mutate(response = mdy(response)) %>%
arrange(id, response) %>%
group_by(id) %>%
mutate(day = dense_rank(response)) %>%
ungroup()
dat2
# # A tibble: 8 x 3
# id response day
# <int> <date> <int>
# 1 101 2017-11-02 1
# 2 101 2017-11-02 1
# 3 101 2017-11-03 2
# 4 101 2017-11-03 2
# 5 101 2017-11-03 2
# 6 101 2017-11-03 2
# 7 102 2017-12-14 1
# 8 102 2017-12-15 2
DATA
dat <- read.table(text = "id response
101 '11/2/2017'
101 '11/2/2017'
101 '11/3/2017'
101 '11/3/2017'
101 '11/3/2017'
101 '11/3/2017'
102 '12/14/2017'
102 '12/15/2017'",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 2