Reputation: 861
I have a data table, A
like:
year location sigma_NN_1 sigma_NN_2 sigma_NN_3
2076 43.59375_-116.78125 1.4681173 1.664289 1.735974
2077 43.59375_-116.78125 1.3798515 1.550524 1.551269
2078 43.59375_-116.78125 0.7934367 1.064248 1.177981
2079 43.59375_-116.78125 1.8235574 1.991018 2.288402
2080 43.59375_-116.78125 2.5560329 2.578093 2.589334
And I want to use it to mask another data table where the value of sigmas are below a threshold, lets say 2. Lets say my second data table is B
year location location_NN_1 location_NN_2 location_NN_3
2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 41.71875_-74.15625
2080 43.59375_-116.78125 41.34375_-90.78125 41.96875_-86.21875 41.21875_-90.65625
So, I want to have something like B[A<2]
, but obviously this does not work, otherwise, I would not be here.
Any suggestions?
Expected output:
output
year location location_NN_1 location_NN_2 location_NN_3
2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 NA
2080 43.59375_-116.78125 NA NA NA
The goal is to find locations whose corresponding sigma in data table A
is less than 2.
Upvotes: 4
Views: 1456
Reputation: 887088
Assuming that these data.table
objects, and assuming that the rows of the 'sigma' columns in 'A' should all be less than the threshold 2.
library(data.table)
nm1 <- grep("sigma", names(A), value = TRUE)
i1 <- setDT(A)[, Reduce(`&`, lapply(.SD, `<`, 2)), .SDcols = nm1]
setDT(B)[i1]
Based on the expected output
nm2 <- grep("sigma", names(A))
B[, (nm2) := Map(function(x, y) replace(x, y >= 2, NA_character_),
.SD, A[, nm2, with = FALSE]), .SDcols = nm2][]
# year location location_NN_1 location_NN_2 location_NN_3
#1: 2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
#2: 2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
#3: 2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
#4: 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 <NA>
#5: 2080 43.59375_-116.78125 <NA> <NA> <NA>
A <- structure(list(year = 2076:2080, location = c("43.59375_-116.78125",
"43.59375_-116.78125", "43.59375_-116.78125", "43.59375_-116.78125",
"43.59375_-116.78125"), sigma_NN_1 = c(1.4681173, 1.3798515,
0.7934367, 1.8235574, 2.5560329), sigma_NN_2 = c(1.664289, 1.550524,
1.064248, 1.991018, 2.578093), sigma_NN_3 = c(1.735974, 1.551269,
1.177981, 2.288402, 2.589334)), class = "data.frame", row.names = c(NA,
-5L))
B <- structure(list(year = 2076:2080, location = c("43.59375_-116.78125",
"43.59375_-116.78125", "43.59375_-116.78125", "43.59375_-116.78125",
"43.59375_-116.78125"), location_NN_1 = c("41.15625_-90.65625",
"43.34375_-78.15625", "41.34375_-90.78125", "43.53125_-116.78125",
"41.34375_-90.78125"), location_NN_2 = c("41.21875_-90.65625",
"43.34375_-78.21875", "41.21875_-90.65625", "41.34375_-90.78125",
"41.96875_-86.21875"), location_NN_3 = c("41.15625_-90.65625",
"43.28125_-78.15625", "41.53125_-73.96875", "41.71875_-74.15625",
"41.21875_-90.65625")), class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 3
Reputation: 9705
Simple base R solution:
B[-(1:2)][A[-(1:2)]>=2] <- NA
Select all columns except first and second B[-(1:2)]
.
Then use the vectorized logical expression A[-(1:2)]>=2
to set the correct elements to NA
.
Results:
year location location_NN_1 location_NN_2 location_NN_3
1 2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
2 2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
3 2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
4 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 <NA>
5 2080 43.59375_-116.78125 <NA> <NA> <NA>
Upvotes: 3
Reputation: 12155
We can use base R subsetting to identify the appropriate cells of B
and replace them with NA
. This approach requires the order of the columns in A
and B
to be the same.
We can use a simple conditional statement on dfa
to find the cells with a sigma value not less than 2. Since we don't want to apply the conditional to the year and condition columns, we subset them out before applying the conditional:
!(dfa[-c(1,2)] < 2)
sigma_NN_1 sigma_NN_2 sigma_NN_3
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
[4,] FALSE FALSE TRUE
[5,] TRUE TRUE TRUE
This returns a logical matrix we can use to subset B
to replace values. What happens here is we subset B
twice: first we ignore the year and location columns to get just the location columns, then we use the earlier conditional to select the rows where the matching sigma value is not less than 2 and insert NA
into it:
dfb[-c(1,2)][!(dfa[-c(1,2)] < 2)] <- NA
dfb
year location location_NN_1 location_NN_2 location_NN_3
1 2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
2 2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
3 2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
4 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 <NA>
5 2080 43.59375_-116.78125 <NA> <NA> <NA>
Upvotes: 4