Reputation: 599
I have 3 dataframes . first df contains one column - Name -
df 1
Name
A
B
C
D
E
F
G
H
I
J
K
Second df contains two columns - Name and counts but some of the Names may or may not be missing from first df.
df 2 -
Name Counts
A 12
B 23
C 34
D 56
E 34
K 44
I want compare all Names from second df to first df , If none of the names are missing , then fine. If any name is missing then that name and its count has to be filled from third df . The third df will always have names and counts available in it.
df 3 -
Name Counts
A 34
B 45
C 34
D 56
E 67
F 435
G 45
H 76
I 76
J 88
K 90
So in above example Since F, G, H , I, J are missing in second df , their info should be added from df 3 .
and second df should be updated as -
Name Counts
A 12
B 23
C 34
D 56
E 34
F 435
G 45
H 76
I 76
J 88
K 44
Any help here would be great
Thanks
Upvotes: 1
Views: 514
Reputation: 66819
You can do...
library(data.table)
setDT(DF1); setDT(DF2); setDT(DF3)
DF1[, n := unique(rbind(DF2, DF3), by="Name")[.(.SD$Name), on=.(Name), x.Counts]]
which adds a column to DF1:
Name n
1: A 12
2: B 23
3: C 34
4: D 56
5: E 34
6: F 435
7: G 45
8: H 76
9: I 76
10: J 88
11: K 44
You could instead do merge(DF1, unique(rbind(DF2, DF3), by="Name"), all.x=TRUE)
, though that would create a new table instead of adding a column to an existing table. The dplyr analogue of this merge is left_join(DF1, bind_rows(DF2, DF3) %>% distinct(Name))
.
How it works
DF = rbind(DF2, DF3)
appends the two source tablesuDF = unique(DF, by="Name")
keeps the first row for each Name
DF1[, n := z]
adds column n
with values z
to DF1
z = x[i, on=, x.v]
uses i
to look up rows of x
then returns column v
, where...
x = uDF
v = Counts
i = .SD$Name
is the vector of names found in DF1
.SD
in j
of DT[i, j]
refers to DT
itself, the "Subset of Data".
Upvotes: 2