Diggy Detroit
Diggy Detroit

Reputation: 137

Merging two datasets

My first dataset contains rows and columns like this below. Some values in the second column (Score_d1) are missing (NA).

  StudentID     Score_d1
  012343        NA
  081245        NA
  957600        78
  212945        86
  305049        NA
  407903        92

Second dataset is as follows. This dataset contains Scores values only for those StudentIds where the Score values were missing in the first dataset. For example, in the first dataset , the score values for StudentID 012343 was missing and the Score_d2 column in this second dataset contains these values ( 75, 85)

  StudentID     Score_d2
  012343        75
  012343        85
  081245        94
  081245        65
  305049        46

I want to merge these two datasets by StudentID such that the missing(NA) Score_d1 values from the first dataset is replaced by the Score_d2 values from the second dataset do i merge these two datasets to create a final dataset like this below, one-to-many ??

  StudentID     Score
  012343        75
  012343        85
  081245        94
  081245        65
  957600        78
  212945        86
  305049        46
  407903        92

Any help on accomplishing this is much appreciated.

Upvotes: 1

Views: 141

Answers (1)

989
989

Reputation: 12937

You could do:

m <- merge(df1,df2,by = "StudentID",all = T)
m$Score_d1[is.na(m$Score_d1)] <- m$Score_d2[!is.na(m$Score_d2)]
m[,1:2]

  # StudentID Score_d1
# 1     12343       75
# 2     12343       85
# 3     81245       94
# 4     81245       65
# 5    212945       86
# 6    305049       46
# 7    407903       92
# 8    957600       78

The idea is to first join two data frames based on StudentID and then, replace those NA values of Score_d1 with non NA values of Score_d2.

DATA

df1 <- structure(list(StudentID = c(12343L, 81245L, 957600L, 212945L, 
305049L, 407903L), Score_d1 = c(NA, NA, 78L, 86L, NA, 92L)), .Names = c("StudentID", 
"Score_d1"), class = "data.frame", row.names = c(NA, -6L))

df2 <- structure(list(StudentID = c(12343L, 12343L, 81245L, 81245L, 
305049L), Score_d2 = c(75L, 85L, 94L, 65L, 46L)), .Names = c("StudentID", 
"Score_d2"), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 2

Related Questions