Reputation: 3640
I have two data frames:
df1 <- data.frame(index = c(rep(2,5), rep(3,8), rep(4,6), rep(5,9)),
start = c(0:4, 0:7, 0:5, 0:8),
end = c(1:5, 1:8, 1:6, 1:9),
v1= rep(1,28))
df2 <- data.frame(index = c(rep(2,5), rep(3,7), rep(4,6), rep(5,6)),
v1 = rnorm(24))
Assume the index
column represents the different observations of a person.
In this example in df1
we have a person with id 2 and 5 observations,
a person with id 3 has 8 observations and so on.
I now want to replace the values of v1
in df1
by the values of v1
in df2
. df2 has sometimes fewer observations than df1, e.g. for persons 3 and 5.
Merging should be in a way that the v1's of df2 are taken for the first observations and if no more observations are available then NA
s should be replaced.
The structure of the df1 frame should be preserved, except that the v1's should be replaced with that from df2.
Upvotes: 1
Views: 397
Reputation: 36076
To do this via a join, you'd need a within-group index so you have an unique identifier to match on.
This is straightforward to create if df2
is already in the correct order and so any NA values after joining should be at the end of each group. You can add an end
variable to df2
and the join on index
and end
.
I also remove v1
from the first dataset, df1
.
library(dplyr)
df2 %>%
group_by(index) %>%
mutate(end = 1:n()) %>%
left_join(select(df1, -v1), .)
index start end v1
1 2 0 1 -1.57254316
2 2 1 2 -1.09000063
3 2 2 3 -0.13775401
4 2 3 4 -0.94088741
5 2 4 5 0.45180766
6 3 0 1 2.14925746
7 3 1 2 1.79103360
8 3 2 3 -0.28344963
9 3 3 4 -0.10250375
10 3 4 5 0.21321949
11 3 5 6 0.07982287
12 3 6 7 0.37987963
13 3 7 8 NA
14 4 0 1 0.41360521
15 4 1 2 -0.28888575
16 4 2 3 -0.58786830
17 4 3 4 0.14340062
18 4 4 5 1.46434162
19 4 5 6 1.73668914
20 5 0 1 1.43112233
21 5 1 2 0.40772442
22 5 2 3 -1.77085141
23 5 3 4 -0.94653948
24 5 4 5 -1.54412791
25 5 5 6 -0.28174510
26 5 6 7 NA
27 5 7 8 NA
28 5 8 9 NA
As pointed out in the comments by @SteveBeaupre, you could replace my old habit of 1:n()
with row_number()
.
Upvotes: 2