Reputation: 11
I'm working with a large time series dataset. I have multiple individuals (id) that were assayed on an hourly basis (hour) over many days (dates). However, not all individuals were observed on the same dates. I would like to create a new variable (obs) that lists the dates of each individuals from 1:n, so that each hourly assay in the same day gets the same number.
I thought I could do this easily in dplyr by using the group_by(id, date) and mutate to count the length of each id's dates, but this just replicates the 'hour' variable which I don't want.
# what i have
id <- rep(c("id1", "id2"), each = 6)
date <- as.Date(rep(c("2018-3-13", "2018-3-14", "2018-4-11", "2018-4-12"), each = 3))
hour <- rep(1:3, 4)
data.have <- data.frame(id, date, hour)
# attempt 1 - just replicates 'hour' which I don't want
data.have %>%
group_by(id, date) %>%
arrange(date) %>%
mutate(obs = 1:length(date))
# what i want
obs <- rep(1:2, each =3, times = 2)
data.want <- data.frame(id, date, hour, obs)
´´´
Upvotes: 1
Views: 200
Reputation: 17090
I would take another approach for reasons which are not technical: I would introduce a relative date, number of days since the first observation:
data.have %>% group_by(id) %>% arrange(date) %>% mutate(reldate=date - date[1])
There are several reasons for that, but the most important one is that in my experience, such clinical data will need a relative date for the purposes of analysis sooner or later, so better generate it sooner than later.
Actually, this is probably the value you desire.
data.want <- data.have %>% group_by(id) %>%
arrange(date) %>%
mutate(reldate=date - date[1]) %>%
mutate(times=as.numeric(reldate + 1))
Assuming, of course, that the measurements were taken daily. If not, you can create the times variable by converting the reldate to factor and taking it as numeric vector.
Upvotes: 0
Reputation: 7626
A slightly cheat-y solution would be to group by id then convert date to a factor and back to a numeric:
data.have %>%
group_by(id) %>%
mutate(obs = as.numeric(as.factor(date)))
Upvotes: 0
Reputation: 4284
You can do it using dense_rank
and group only by ID
data.have %>%
group_by(id) %>%
mutate(obs=dense_rank(date))
Upvotes: 1