Earthshaker
Earthshaker

Reputation: 599

Comparing values in multiple columns in R dataframes & updating missing values

I have 3 dataframes . first df contains one column - Name -

df 1
    Name 
    A    
    B    
    C    
    D    
    E    
    F    
    G  
    H
    I
    J
    K   

Second df contains two columns - Name and counts but some of the Names may or may not be missing from first df.

df 2 - 
  Name   Counts 
    A    12
    B    23
    C    34
    D    56
    E    34
    K    44

I want compare all Names from second df to first df , If none of the names are missing , then fine. If any name is missing then that name and its count has to be filled from third df . The third df will always have names and counts available in it.

df 3 - 
 Name   Counts 
    A    34
    B    45
    C    34
    D    56
    E    67
    F    435
    G    45
    H    76
    I    76
    J    88
    K    90

So in above example Since F, G, H , I, J are missing in second df , their info should be added from df 3 .

and second df should be updated as -

Name   Counts 
    A    12
    B    23
    C    34
    D    56
    E    34
    F    435
    G    45
    H    76
    I    76
    J    88
    K    44

Any help here would be great

Thanks

Upvotes: 1

Views: 514

Answers (1)

Frank
Frank

Reputation: 66819

You can do...

library(data.table)
setDT(DF1); setDT(DF2); setDT(DF3)

DF1[, n := unique(rbind(DF2, DF3), by="Name")[.(.SD$Name), on=.(Name), x.Counts]]

which adds a column to DF1:

    Name   n
 1:    A  12
 2:    B  23
 3:    C  34
 4:    D  56
 5:    E  34
 6:    F 435
 7:    G  45
 8:    H  76
 9:    I  76
10:    J  88
11:    K  44

You could instead do merge(DF1, unique(rbind(DF2, DF3), by="Name"), all.x=TRUE), though that would create a new table instead of adding a column to an existing table. The dplyr analogue of this merge is left_join(DF1, bind_rows(DF2, DF3) %>% distinct(Name)).

How it works

  • DF = rbind(DF2, DF3) appends the two source tables
  • uDF = unique(DF, by="Name") keeps the first row for each Name
  • DF1[, n := z] adds column n with values z to DF1
  • z = x[i, on=, x.v] uses i to look up rows of x then returns column v, where...
    • x = uDF
    • v = Counts
    • i = .SD$Name is the vector of names found in DF1

.SD in j of DT[i, j] refers to DT itself, the "Subset of Data".

Upvotes: 2

Related Questions