Reputation: 179
I've searched and found lots of solutions that come close but don't quite answer my question.
I want a function that will add 0/1 flags to data, indicating the last observation per unit. The data are grouped by unit and by the kind of test that was done.
I want to use dplyr and have the following attempt, but the second mutate_
call is wrong.
getLastObsFlag <- function(data, id="subject", time="studyday", test="test"){
data <- arrange_(data, id, test, time) %>%
mutate_(lastObsFlag = 0) %>%
group_by_(id, test) %>%
mutate_(lastObsFlag = replace(time, n(), 1))
as.data.frame(data)
}
# Restructure pbcseq from the survival package
junk <- gather(pbcseq, test, value, 12:18)
# That just loaded reshape2 and plyr, so unload them
unloadNamespace("reshape2")
unloadNamespace("plyr")
getLastObsFlag(junk, id="id", time="day", test="test")
The call to n()
throws an error: Error in dplyr::n() : This function should not be called directly
I've read this is an issue with having plyr attached as well as dplyr (tho I expected using dplyr::n()
to overcome that). I checked and plyr is loaded via a namespace (and not attached)
. I used unloadNamespace
to remove it (and reshape2), but still get the same error message.
I'd be grateful of any pointers. I'm not attached to n()
, so an alternative solution would be fine.
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
[6] LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] dmhelp_0.5 brglm_0.5-9 profileModel_0.5-9 dplyr_0.4.3 tidyr_0.2.0 gbm_2.1.1 lattice_0.20-33
[8] survival_2.38-3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.0 assertthat_0.1 MASS_7.3-44 grid_3.2.2 R6_2.1.1 DBI_0.3.1 magrittr_1.5 stringi_0.5-5
[9] lazyeval_0.1.10 tools_3.2.2 stringr_1.0.0
Upvotes: 1
Views: 764
Reputation: 887223
We can use interp
from library(lazyeval)
.
library(lazyeval)
getLastObsFlag <- function(data, id="subject", time="studyday", test="test"){
data <- arrange_(data, id, test, time) %>%
mutate_(lastObsFlag = 0) %>%
group_by_(id, test) %>%
mutate_(.dots=list(lastObsFlag = interp(~replace(lastObsFlag,
n(), 1))))
as.data.frame(data)
}
Upon testing
head(getLastObsFlag(junk, id="id", time="day", test="test"),25)[c('id', 'test', 'lastObsFlag')]
# id test lastObsFlag
#1 1 bili 0
#2 1 bili 1
#3 1 chol 0
#4 1 chol 1
#5 1 albumin 0
#6 1 albumin 1
#7 1 alk.phos 0
#8 1 alk.phos 1
#9 1 ast 0
#10 1 ast 1
#11 1 platelet 0
#12 1 platelet 1
#13 1 protime 0
#14 1 protime 1
#15 2 bili 0
#16 2 bili 0
#17 2 bili 0
#18 2 bili 0
#19 2 bili 0
#20 2 bili 0
#21 2 bili 0
#22 2 bili 0
#23 2 bili 1
#24 2 chol 0
#25 2 chol 0
Upvotes: 0
Reputation: 306
We can add a variable to the entire data frame with ifelse
and a dplyr
window function inside mutate
.
junk <- junk %>% group_by(id) %>% arrange(day) %>% mutate(flag = ifelse(min_rank(desc(day))!=1,0,1))
testing results...
id futime status trt age sex day ascites hepato spiders edema stage test value flag
1 1 400 2 1 58.76523 f 0 1 1 1 1 4 bili 14.50 0
2 1 400 2 1 58.76523 f 0 1 1 1 1 4 chol 261.00 0
3 1 400 2 1 58.76523 f 0 1 1 1 1 4 albumin 2.60 0
4 1 400 2 1 58.76523 f 0 1 1 1 1 4 alk.phos 1718.00 0
5 1 400 2 1 58.76523 f 0 1 1 1 1 4 ast 138.00 0
6 1 400 2 1 58.76523 f 0 1 1 1 1 4 platelet 190.00 0
7 1 400 2 1 58.76523 f 0 1 1 1 1 4 protime 12.20 0
8 1 400 2 1 58.76523 f 192 1 1 1 1 4 bili 21.30 1
9 1 400 2 1 58.76523 f 192 1 1 1 1 4 chol NA 1
10 1 400 2 1 58.76523 f 192 1 1 1 1 4 albumin 2.94 1
Upvotes: 2