harrys
harrys

Reputation: 179

Creating last observation flags for grouped data with dplyr

I've searched and found lots of solutions that come close but don't quite answer my question.

I want a function that will add 0/1 flags to data, indicating the last observation per unit. The data are grouped by unit and by the kind of test that was done.

I want to use dplyr and have the following attempt, but the second mutate_ call is wrong.

getLastObsFlag <- function(data, id="subject", time="studyday", test="test"){
  data <- arrange_(data, id, test, time) %>%
    mutate_(lastObsFlag = 0) %>%
    group_by_(id, test) %>%
    mutate_(lastObsFlag = replace(time, n(), 1))

  as.data.frame(data)
}

# Restructure pbcseq from the survival package
junk <- gather(pbcseq, test, value, 12:18)
# That just loaded reshape2 and plyr, so unload them
unloadNamespace("reshape2")
unloadNamespace("plyr")
getLastObsFlag(junk, id="id", time="day", test="test")

The call to n() throws an error: Error in dplyr::n() : This function should not be called directly

I've read this is an issue with having plyr attached as well as dplyr (tho I expected using dplyr::n() to overcome that). I checked and plyr is loaded via a namespace (and not attached). I used unloadNamespace to remove it (and reshape2), but still get the same error message.

I'd be grateful of any pointers. I'm not attached to n(), so an alternative solution would be fine.

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8   
 [6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dmhelp_0.5         brglm_0.5-9        profileModel_0.5-9 dplyr_0.4.3        tidyr_0.2.0        gbm_2.1.1          lattice_0.20-33   
[8] survival_2.38-3   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.0     assertthat_0.1  MASS_7.3-44     grid_3.2.2      R6_2.1.1        DBI_0.3.1       magrittr_1.5    stringi_0.5-5  
 [9] lazyeval_0.1.10 tools_3.2.2     stringr_1.0.0  

Upvotes: 1

Views: 764

Answers (2)

akrun
akrun

Reputation: 887223

We can use interp from library(lazyeval).

library(lazyeval)
getLastObsFlag <- function(data, id="subject", time="studyday", test="test"){
       data <- arrange_(data, id, test, time) %>%
                    mutate_(lastObsFlag = 0) %>%
                    group_by_(id, test) %>%
                    mutate_(.dots=list(lastObsFlag = interp(~replace(lastObsFlag,
                                               n(), 1))))
      as.data.frame(data)
   }

Upon testing

head(getLastObsFlag(junk, id="id", time="day", test="test"),25)[c('id', 'test', 'lastObsFlag')]
#  id     test lastObsFlag
#1   1     bili           0
#2   1     bili           1
#3   1     chol           0
#4   1     chol           1
#5   1  albumin           0
#6   1  albumin           1
#7   1 alk.phos           0
#8   1 alk.phos           1
#9   1      ast           0
#10  1      ast           1
#11  1 platelet           0
#12  1 platelet           1
#13  1  protime           0
#14  1  protime           1
#15  2     bili           0
#16  2     bili           0
#17  2     bili           0
#18  2     bili           0
#19  2     bili           0
#20  2     bili           0
#21  2     bili           0
#22  2     bili           0
#23  2     bili           1
#24  2     chol           0
#25  2     chol           0

Upvotes: 0

misspelled
misspelled

Reputation: 306

We can add a variable to the entire data frame with ifelse and a dplyrwindow function inside mutate.

junk <- junk %>% group_by(id) %>% arrange(day) %>% mutate(flag = ifelse(min_rank(desc(day))!=1,0,1))

testing results...

 id futime status trt      age sex day ascites hepato spiders edema stage     test   value flag
1   1    400      2   1 58.76523   f   0       1      1       1     1     4     bili   14.50    0
2   1    400      2   1 58.76523   f   0       1      1       1     1     4     chol  261.00    0
3   1    400      2   1 58.76523   f   0       1      1       1     1     4  albumin    2.60    0
4   1    400      2   1 58.76523   f   0       1      1       1     1     4 alk.phos 1718.00    0
5   1    400      2   1 58.76523   f   0       1      1       1     1     4      ast  138.00    0
6   1    400      2   1 58.76523   f   0       1      1       1     1     4 platelet  190.00    0
7   1    400      2   1 58.76523   f   0       1      1       1     1     4  protime   12.20    0
8   1    400      2   1 58.76523   f 192       1      1       1     1     4     bili   21.30    1
9   1    400      2   1 58.76523   f 192       1      1       1     1     4     chol      NA    1
10  1    400      2   1 58.76523   f 192       1      1       1     1     4  albumin    2.94    1

Upvotes: 2

Related Questions