Reputation: 601
Suppose we have the following data with column names "id", "time" and "x":
df<-
structure(
list(
id = c(1L, 1L, 1L, 2L, 2L, 3L, 3L),
time = c(20L, 6L, 7L, 11L, 13L, 2L, 6L),
x = c(1L, 1L, 0L, 1L, 1L, 1L, 0L)
),
.Names = c("id", "time", "x"),
class = "data.frame",
row.names = c(NA,-7L)
)
Each id has multiple observations for time and x. I want to extract the last observation for each id and form a new dataframe which repeats these observations according to the number of observations per each id in the original data. I am able to extract the last observations for each id using the following codes
library(dplyr)
df<-df%>%
group_by(id) %>%
filter( ((x)==0 & row_number()==n())| ((x)==1 & row_number()==n()))
What is left unresolved is the repetition aspect. The expected output would look like
df <-
structure(
list(
id = c(1L, 1L, 1L, 2L, 2L, 3L, 3L),
time = c(7L, 7L, 7L, 13L, 13L, 6L, 6L),
x = c(0L, 0L, 0L, 1L, 1L, 0L, 0L)
),
.Names = c("id", "time", "x"),
class = "data.frame",
row.names = c(NA,-7L)
)
Thanks for your help in advance.
Upvotes: 4
Views: 215
Reputation: 1101
Using data.table
you can try
library(data.table)
setDT(df)[,.(time=rep(time[.N],.N), x=rep(x[.N],.N)), by=id]
id time x
1: 1 7 0
2: 1 7 0
3: 1 7 0
4: 2 13 1
5: 2 13 1
6: 3 6 0
7: 3 6 0
Following @thelatemai, to avoid name the columns you can also try
df[, .SD[rep(.N,.N)], by=id]
id time x
1: 1 7 0
2: 1 7 0
3: 1 7 0
4: 2 13 1
5: 2 13 1
6: 3 6 0
7: 3 6 0
Upvotes: 2
Reputation: 93833
slice
will be your friend in the tidyverse I reckon:
df %>%
group_by(id) %>%
slice(rep(n(),n()))
## A tibble: 7 x 3
## Groups: id [3]
# id time x
# <int> <int> <int>
#1 1 7 0
#2 1 7 0
#3 1 7 0
#4 2 13 1
#5 2 13 1
#6 3 6 0
#7 3 6 0
In data.table
, you could also use the mult=
argument of a join:
library(data.table)
setDT(df)
df[df[,.(id)], on="id", mult="last"]
# id time x
#1: 1 7 0
#2: 1 7 0
#3: 1 7 0
#4: 2 13 1
#5: 2 13 1
#6: 3 6 0
#7: 3 6 0
And in base R, a merge
will get you there too:
merge(df["id"], df[!duplicated(df$id, fromLast=TRUE),])
# id time x
#1 1 7 0
#2 1 7 0
#3 1 7 0
#4 2 13 1
#5 2 13 1
#6 3 6 0
#7 3 6 0
Upvotes: 2
Reputation: 60080
You can do this by using last()
to grab the last row within each id.
df %>%
group_by(id) %>%
mutate(time = last(time),
x = last(x))
Because last(x)
returns a single value, it gets expanded out to fill all the rows in the mutate()
call.
This can also be applied to an arbitrary number of variables using mutate_at
:
df %>%
group_by(id) %>%
mutate_at(vars(-id), ~ last(.))
Upvotes: 2
Reputation: 389037
We can use ave
to find the max
row number for each ID
and subset it from the data frame.
df[ave(1:nrow(df), df$id, FUN = max), ]
# id time x
#3 1 7 0
#3.1 1 7 0
#3.2 1 7 0
#5 2 13 1
#5.1 2 13 1
#7 3 6 0
#7.1 3 6 0
Upvotes: 4