Mark
Mark

Reputation: 91

Carry Forward First Observation for a Variable For Each Patient

My dataset has 3 variables:

Patient ID    Outcome     Duration  
1               1          3
1               0          4
1               0          5
2               0          2
3               1          1
3               1          2

What I want is the first observation for "Duration" for each patient ID to be carried forward.

That is, for patient #1 I want duration to read 3,3,3 for patient #3 I want duration to read 1, 1.

Upvotes: 2

Views: 391

Answers (3)

hrbrmstr
hrbrmstr

Reputation: 78832

This is a good job for dplyr (a data.frame wicked-better successor to plyr with far better syntax than data.table):

library(dplyr)

dat %>% 
  group_by(`Patient ID`) %>% 
  mutate(Duration=first(Duration))

## Source: local data frame [6 x 3]
## Groups: Patient ID
## 
##   Patient ID Outcome Duration
## 1          1       1        3
## 2          1       0        3
## 3          1       0        3
## 4          2       0        2
## 5          3       1        1
## 6          3       1        1

Upvotes: 2

mathematical.coffee
mathematical.coffee

Reputation: 56935

Another alternative using plyr (if you will be doing lots of operations on your dataframe though, and particularly if it's big, I recommend data.table. It has a steeper learning curve but well worth it).

library(plyr)
ddply(mydf, .(PatientID), transform, Duration=Duration[1])  PatientID 
# Outcome Duration
# 1         1       1        3
# 2         1       0        3
# 3         1       0        3
# 4         2       0        2
# 5         3       1        1
# 6         3       1        1

Upvotes: 0

jazzurro
jazzurro

Reputation: 23574

Here is one way with data.table. You take the first number in Duration and ask R to repeat it for each PatientID.

mydf <- read.table(text = "PatientID    Outcome     Duration  
1               1          3
1               0          4
1               0          5
2               0          2
3               1          1
3               1          2", header = T)

library(data.table)
setDT(mydf)[, Duration := Duration[1L], by = PatientID]
print(mydf)

#   PatientID Outcome Duration
#1:         1       1        3
#2:         1       0        3
#3:         1       0        3
#4:         2       0        2
#5:         3       1        1
#6:         3       1        1

Upvotes: 5

Related Questions