Reputation: 71
I have a data frame in R like this. I would like to extract the last visit for each subject.
SUBJID VISIT
40161 3
40161 4
40161 5
40161 6
40161 9
40201 3
40202 6
40202 8
40241 3
40241 4
The desired output is as follows
SUBJID VISIT
40161 9
40201 3
40202 8
How should I do this in R? Thanks very much for your help.
Upvotes: 6
Views: 4178
Reputation: 805
It can be using sqldf
package too,
library(sqldf)
sqldf("SELECT SUBJID, MAX(VISIT) From df GROUP BY by SUBJID")
SUBJID VISIT
1 40161 9
2 40201 3
3 40202 8
4 40241 4
Upvotes: 1
Reputation: 66819
Alternately (with @agstudy's data),
g <- grouping(df$SUBJID)
df[g[attr(g, "ends")],]
SUBJID VISIT
5 40161 9
6 40201 3
8 40202 8
10 40241 4
or with data.table
library(data.table)
unique(setDT(df), by="SUBJID", fromLast=TRUE)
SUBJID VISIT
1: 40161 9
2: 40201 3
3: 40202 8
4: 40241 4
Upvotes: 0
Reputation: 15441
Because we can, another base option:
do.call(rbind,
lapply(split(dat, dat$SUBJID),
function(x) tail(x$VISIT, 1) ) )
# [,1]
#40161 9
#40201 3
#40202 8
#40241 4
EDIT
As @BenBolker suggests:
do.call(rbind,
lapply(split(dat, dat$SUBJID),
function(x) tail(x, 1) ) )
should work for all columns if you have more.
Upvotes: 3
Reputation: 3364
While agstudy is correct, there is another way with the stats package and the aggregate function.
df <- read.table(text="SUBJID VISIT
40161 3
40161 4
40161 5
40161 6
40161 9
40201 3
40202 6
40202 8
40241 3
40241 4", header=TRUE)
aggregate(VISIT ~ SUBJID, df, max)
SUBJID VISIT
1 40161 9
2 40201 3
3 40202 8
4 40241 4
Upvotes: 6
Reputation: 81693
Here's a simple solution with diff
:
dat[c(diff(dat$SUBJID) != 0, TRUE), ]
SUBJID VISIT
5 40161 9
6 40201 3
8 40202 8
10 40241 4
It's also possible with by
:
do.call(rbind, by(dat, dat$SUBJID, tail, 1))
SUBJID VISIT
40161 40161 9
40201 40201 3
40202 40202 8
40241 40241 4
Upvotes: 1
Reputation: 193527
To show another alternative, because I like the simplicity of its syntax, you can use data.table
too. Assuming your data.frame
is called "df":
library(data.table)
# data.table 1.8.7 For help type: help("data.table")
DT <- data.table(df, key = "SUBJID")
DT[, list(VISIT = max(VISIT)), by = key(DT)]
# SUBJID V1
# 1: 40161 9
# 2: 40201 3
# 3: 40202 8
# 4: 40241 4
And, while we are sharing the many ways to do this in R, if you're comfortable with SQL syntax, you can also use sqldf
as follows:
library(sqldf)
sqldf("select SUBJID, max(VISIT) `VISIT` from df group by SUBJID")
SUBJID VISIT
1 40161 9
2 40201 3
3 40202 8
4 40241 4
Upvotes: 4
Reputation: 121568
Using plyr
package for exmple:
ddply(dat,.(SUBJID),summarise,VISIT=tail(VISIT,1))
SUBJID VISIT
1 40161 9
2 40201 3
3 40202 8
4 40241 4
Where dat is :
dat <- read.table(text ='SUBJID VISIT
40161 3
40161 4
40161 5
40161 6
40161 9
40201 3
40202 6
40202 8
40241 3
40241 4',head=T)
Upvotes: 1