Reputation: 493
Another R question. Have looked through the data.table vignettes and seen solutions like these:
But unfortunately while they're close, I'm somehow missing something in my understanding.
My initial data tables include one that includes results and another with standards. Several columns are common between the two tables. Here's a sample (more columns exist for both tables, but they are not common between the two).
Results
ID Region Locale Medium Name Method
3324 Agate Zone C water Cadmium Z
2432 Gneiss Zone B air Calcium R
2433 Agate Zone A water Molybdenum Q
78882 Agate Zone D water Iron M
Standards
ID Region Locale Medium Name CoeffA CoeffB
3214 Agate Zone A water Cadmium -.243 1.43
3324 Agate Zone C water Cadmium -.243 1.43
2432 Gneiss Zone B water Calcium .432 0.44
78882 Agate Zone D water Iron 1.475 0
There are many more results than standards and some results have no standards.
What I'd like to do is add the mathematical coefficient values of the standards table to the results table as new columns (C-a
and C-b
). Ultimately I'll use these to calculate comparative standard values.
Results
ID Region Locale Medium Name Method C-a C-b
3324 Agate Zone C water Cadmium Z -.243 1.43
2432 Gneiss Zone B air Calcium R .432 0.44
2433 Agate Zone A water Molybdenum Q NA NA
78882 Agate Zone D water Iron M 1.475 0
I've tried the following without success:
Results[Standards]
yields the standards values with result columns as NA
Standards[Results]
yields the results values with standards columns as NA
merge(Results,Standards)
after using setkey(c("ID","Region","Locale","Medium"))
for the common key columns for Results
and Standards
, yields standards values with result columns as NA
I would have thought that one of these syntaxes would definitely have yielded coefficient columns with values other than NA
.
Any suggestions on where I should look or what I'm missing?
Thanks in advance for your kind assistance.
Upvotes: 3
Views: 13763
Reputation: 309
Try this, you can perform it without setkey as follows
require(data.table)
newResults <- merge(x = Results, y = Standards, by = "ID", all.x = TRUE)
setnames(newResults,"CoeffA","C-a")
setnames(newResults,"CoeffB","C-b")
newResults
ID Region Locale Medium Name Method C-a C-b
2432 Gneiss Zone B air Calcium R .432 0.44
2433 Agate Zone A water Molybdenum Q NA NA
3324 Agate Zone C water Cadmium Z -.243 1.43
78882 Agate Zone D water Iron M 1.475 0
If you don't want NAs:
newResults[is.na(newResults)] <- 0 #replace NA with Zero
newResults[is.na(newResults)] <- "No value available" #replace NA with Text
Upvotes: 4
Reputation: 6372
First off, setkey can't be used for multiple variables, you need to use setkeyv instead.
setkeyv(Results,c("ID","Region","Locale","Medium"))
setkeyv(Standards,c("ID","Region","Locale","Medium"))
Then:
JoinedDT <- merge(Results,Standards, all.x = TRUE)
This will give an NA
in any results row that does not have a Standards Row. If there are multiple Standards rows for one Results row, you will get two rows in your resulting data table.
To set NA
to 0
:
JoinedDT[is.na(JoinedDT$CoeffA),CoeffA:= 0]
JoinedDT[is.na(JoinedDT$CoeffB),CoeffB:= 0]
Upvotes: 1