Reputation: 35
I am working with a dataset where each row contains a separate observation of an individual (ID). I would like R to look for the first and last observation dates and calculate the total number of "activity days", putting these values in a new column. When I run the code below, I get one column with the same number for every entry — the total number of days between the earliest and latest days of the entire data set, not per individual.
I’m using dplyr and lubridate. I used this page (difference between the first date and last date within same individual in R) for code assistance, but it isn't working for me.
Can anyone help me figure out where I’m going wrong?
data$date <- mdy(data$date)
data <- data %>%
group_by(ID) %>%
mutate(total.activity.days = max((date)) - min((date)))
cage date ID total.activity.days
1 1 2018-04-30 40 54
2 1 2018-04-30 76 54
3 1 2018-05-02 40 54
4 1 2018-05-02 76 54
5 1 2018-05-04 40 54
6 1 2018-05-04 76 54
7 1 2018-04-28 59 54
8 1 2018-04-29 59 54
Upvotes: 1
Views: 681
Reputation: 1354
All that is needed is:
df %>% group_by(ID) %>% mutate(total.activity.days = max(date) - min(date) )
Which gives:
cage date ID total.activity.days
<int> <date> <int> <time>
1 1 2018-04-30 40 4
2 1 2018-04-30 76 4
3 1 2018-05-02 40 4
4 1 2018-05-02 76 4
5 1 2018-05-04 40 4
6 1 2018-05-04 76 4
7 1 2018-04-28 59 1
8 1 2018-04-29 59 1
Upvotes: 1
Reputation: 1376
Try this out:
data_new <- data %>%
group_by(ID) %>%
summarise(MaxDate = max(date),
MinDate = min(date)) %>%
mutate(total.activity.days = MaxDate - MinDate) %>%
select(ID, total.activity.days)
Upvotes: 1