Reputation: 623
I have a data frame that looks like this:
originalID <- c('A','A','B','B','B','B')
newID <- c('D','G','R','Q','N','P')
newTime <- c(2,4,3,6,7,10)
x <- data.frame(originalID, newID, newTime)
x
originalID newID newTime
1 A D 2
2 A G 4
3 B R 3
4 B Q 6
5 B N 7
6 B P 10
My desired result is a data frame that looks like this:
originalID newID newTime
1 A A 0
2 A A 1
3 A D 2
4 A D 3
5 A G 4
6 B B 0
7 B B 1
8 B B 2
9 B R 3
10 B R 4
11 B R 5
12 B Q 6
13 B N 7
14 B N 8
15 B N 9
16 B P 10
Basically, I'm trying to fill in the sequence before and between each value of newTime
, starting from 0 and ending at the max value of newTime
for each originalID
. I want to repeat the values of the other columns, except from 0 to the first newTime
, for which I want to repeat the value from the originalID
.
Many thanks for your help!
Upvotes: 1
Views: 1253
Reputation: 323226
Try this ~(PS also, add stringsAsFactors=F
in your df)
Data Input
x <- data.frame(originalID, newID, newTime,stringsAsFactors = F)
Solution
library(tidyr)
library(zoo)
library(dplyr)
x=x %>% group_by(originalID) %>% complete(newTime=full_seq(0:max(newTime),1))
x=x %>% group_by(originalID)%>%mutate(newID=zoo::na.locf(newID,na.rm = F))
x$newID[is.na(x$newID)]=x$originalID[is.na(x$newID)]
x
# A tibble: 16 x 3
# Groups: originalID [2]
originalID newTime newID
<chr> <dbl> <chr>
1 A 0 A
2 A 1 A
3 A 2 D
4 A 3 D
5 A 4 G
6 B 0 B
7 B 1 B
8 B 2 B
9 B 3 R
10 B 4 R
11 B 5 R
12 B 6 Q
13 B 7 N
14 B 8 N
15 B 9 N
16 B 10 P
Upvotes: 4