Reputation: 137
My first dataset contains rows and columns like this below. Some values in the second column (Score_d1
) are missing (NA
).
StudentID Score_d1
012343 NA
081245 NA
957600 78
212945 86
305049 NA
407903 92
Second dataset is as follows. This dataset contains Scores values only for those StudentIds where the Score values were missing in the first dataset. For example, in the first dataset , the score values for StudentID 012343
was missing and the Score_d2
column in this second dataset contains these values ( 75, 85)
StudentID Score_d2
012343 75
012343 85
081245 94
081245 65
305049 46
I want to merge these two datasets by StudentID such that the missing(NA) Score_d1
values from the first dataset is replaced by the Score_d2
values from the second dataset do i merge these two datasets to create a final dataset like this below, one-to-many ??
StudentID Score
012343 75
012343 85
081245 94
081245 65
957600 78
212945 86
305049 46
407903 92
Any help on accomplishing this is much appreciated.
Upvotes: 1
Views: 141
Reputation: 12937
You could do:
m <- merge(df1,df2,by = "StudentID",all = T)
m$Score_d1[is.na(m$Score_d1)] <- m$Score_d2[!is.na(m$Score_d2)]
m[,1:2]
# StudentID Score_d1
# 1 12343 75
# 2 12343 85
# 3 81245 94
# 4 81245 65
# 5 212945 86
# 6 305049 46
# 7 407903 92
# 8 957600 78
The idea is to first join two data frames based on StudentID
and then, replace those NA
values of Score_d1
with non NA
values of Score_d2
.
DATA
df1 <- structure(list(StudentID = c(12343L, 81245L, 957600L, 212945L,
305049L, 407903L), Score_d1 = c(NA, NA, 78L, 86L, NA, 92L)), .Names = c("StudentID",
"Score_d1"), class = "data.frame", row.names = c(NA, -6L))
df2 <- structure(list(StudentID = c(12343L, 12343L, 81245L, 81245L,
305049L), Score_d2 = c(75L, 85L, 94L, 65L, 46L)), .Names = c("StudentID",
"Score_d2"), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 2