Reputation: 7725
I want to group by person
and date1
and fill in missing data for date2
and indicator
by person and day IF with the person's next observation occurs in the same day.
For instance, person
1 is missing date2
and indicator
values for the second and third observations. As shown below, I want to replace these missing values with the next non-NA
observation in the same day for this person: date2==2018-02-02 15:04:00
and indicator==1
.
Note that for person
2, the last NA
does not have a next observation in the same day, so it needs to remain NA
.
Here is the data frame I have:
person date1 date2 indicator
1 1 2018-02-02 12:00:00 2018-02-02 12:05:00 1
2 1 2018-02-02 13:00:00 <NA> NA
3 1 2018-02-02 14:00:00 <NA> NA
4 1 2018-02-02 15:00:00 2018-02-02 15:04:00 1
5 2 2018-02-01 12:00:00 <NA> NA
6 2 2018-02-01 13:00:00 2018-02-01 13:06:00 1
7 2 2018-02-02 12:00:00 2018-02-02 12:03:00 1
8 2 2018-02-03 12:00:00 <NA> NA
Here is the data frame I want:
person date1 date2 indicator
1 1 2018-02-02 12:00:00 2018-02-02 12:05:00 1
2 1 2018-02-02 13:00:00 2018-02-02 15:04:00 1
3 1 2018-02-02 14:00:00 2018-02-02 15:04:00 1
4 1 2018-02-02 15:00:00 2018-02-02 15:04:00 1
5 2 2018-02-01 12:00:00 2018-02-01 13:06:00 1
6 2 2018-02-01 13:00:00 2018-02-01 13:06:00 1
7 2 2018-02-02 12:00:00 2018-02-02 12:03:00 1
8 2 2018-02-03 12:00:00 <NA> NA
Example:
library(tidyverse)
df.have <- data.frame(person=c(1, 1, 1, 1, 2, 2, 2, 2),
date1=ymd_hms(c("2018-02-02 12:00:00",
"2018-02-02 13:00:00",
"2018-02-02 14:00:00",
"2018-02-02 15:00:00",
"2018-02-01 12:00:00",
"2018-02-01 13:00:00",
"2018-02-02 12:00:00",
"2018-02-03 12:00:00")),
date2=ymd_hms(c("2018-02-02 12:05:00",
NA,
NA,
"2018-02-02 15:04:00",
NA,
"2018-02-01 13:06:00",
"2018-02-02 12:03:00",
NA)),
indicator=c(1, NA, NA, 1,
NA, 1, 1, NA))
df.want <- data.frame(person=c(1, 1, 1, 1, 2, 2, 2, 2),
date1=ymd_hms(c("2018-02-02 12:00:00",
"2018-02-02 13:00:00",
"2018-02-02 14:00:00",
"2018-02-02 15:00:00",
"2018-02-01 12:00:00",
"2018-02-01 13:00:00",
"2018-02-02 12:00:00",
"2018-02-03 12:00:00")),
date2=ymd_hms(c("2018-02-02 12:05:00",
"2018-02-02 15:04:00",
"2018-02-02 15:04:00",
"2018-02-02 15:04:00",
"2018-02-01 13:06:00",
"2018-02-01 13:06:00",
"2018-02-02 12:03:00",
NA)),
indicator=c(1, 1, 1, 1,
1, 1, 1, NA))
I can filter down to some of the replacement values, but still a good bit from where I want to get.
df.have %>%
group_by(person, date(date1)) %>%
arrange(person, date1) %>%
filter(row_number() %in% c(n()))
Upvotes: 1
Views: 126
Reputation: 18425
You can do it like this (note that you also need lubridate
as well as the tidyverse
packages)...
df.want <- df.have %>% mutate(day=date(date1)) %>% #add a date variable for grouping
group_by(day,person) %>%
fill(date2,indicator,.direction = "up") %>% #use tidyr 'fill' to remove NAs
ungroup() %>%
select(-day) %>% #remove grouping variable
arrange(person,date1) #restore original order
df.want
# A tibble: 8 x 4
person date1 date2 indicator
<dbl> <dttm> <dttm> <dbl>
1 1 2018-02-02 12:00:00 2018-02-02 12:05:00 1
2 1 2018-02-02 13:00:00 2018-02-02 15:04:00 1
3 1 2018-02-02 14:00:00 2018-02-02 15:04:00 1
4 1 2018-02-02 15:00:00 2018-02-02 15:04:00 1
5 2 2018-02-01 12:00:00 2018-02-01 13:06:00 1
6 2 2018-02-01 13:00:00 2018-02-01 13:06:00 1
7 2 2018-02-02 12:00:00 2018-02-02 12:03:00 1
8 2 2018-02-03 12:00:00 NA NA
Upvotes: 2