OverFlow Police
OverFlow Police

Reputation: 861

Use a data table to mask/filter another data table in R

I have a data table, A like:

year          location    sigma_NN_1 sigma_NN_2 sigma_NN_3
2076 43.59375_-116.78125  1.4681173   1.664289   1.735974
2077 43.59375_-116.78125  1.3798515   1.550524   1.551269
2078 43.59375_-116.78125  0.7934367   1.064248   1.177981
2079 43.59375_-116.78125  1.8235574   1.991018   2.288402
2080 43.59375_-116.78125  2.5560329   2.578093   2.589334

And I want to use it to mask another data table where the value of sigmas are below a threshold, lets say 2. Lets say my second data table is B

year            location       location_NN_1      location_NN_2      location_NN_3
2076 43.59375_-116.78125  41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
2077 43.59375_-116.78125  43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
2078 43.59375_-116.78125  41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 41.71875_-74.15625
2080 43.59375_-116.78125  41.34375_-90.78125 41.96875_-86.21875 41.21875_-90.65625

So, I want to have something like B[A<2], but obviously this does not work, otherwise, I would not be here.

Any suggestions?

Expected output:

output

year            location       location_NN_1      location_NN_2      location_NN_3
2076 43.59375_-116.78125  41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
2077 43.59375_-116.78125  43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
2078 43.59375_-116.78125  41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125                 NA
2080 43.59375_-116.78125                  NA                 NA                 NA

The goal is to find locations whose corresponding sigma in data table A is less than 2.

Upvotes: 4

Views: 1456

Answers (3)

akrun
akrun

Reputation: 887088

Assuming that these data.table objects, and assuming that the rows of the 'sigma' columns in 'A' should all be less than the threshold 2.

library(data.table)
nm1 <- grep("sigma", names(A), value = TRUE)
i1 <- setDT(A)[, Reduce(`&`, lapply(.SD, `<`, 2)), .SDcols = nm1]
setDT(B)[i1] 

Update

Based on the expected output

nm2 <- grep("sigma", names(A))
B[, (nm2) := Map(function(x, y) replace(x, y >= 2, NA_character_),
        .SD, A[, nm2, with = FALSE]), .SDcols = nm2][]
# year            location       location_NN_1      location_NN_2      location_NN_3
#1: 2076 43.59375_-116.78125  41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
#2: 2077 43.59375_-116.78125  43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
#3: 2078 43.59375_-116.78125  41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
#4: 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125               <NA>
#5: 2080 43.59375_-116.78125                <NA>               <NA>               <NA>

data

A <- structure(list(year = 2076:2080, location = c("43.59375_-116.78125", 
"43.59375_-116.78125", "43.59375_-116.78125", "43.59375_-116.78125", 
"43.59375_-116.78125"), sigma_NN_1 = c(1.4681173, 1.3798515, 
0.7934367, 1.8235574, 2.5560329), sigma_NN_2 = c(1.664289, 1.550524, 
1.064248, 1.991018, 2.578093), sigma_NN_3 = c(1.735974, 1.551269, 
1.177981, 2.288402, 2.589334)), class = "data.frame", row.names = c(NA, 
-5L))

B <- structure(list(year = 2076:2080, location = c("43.59375_-116.78125", 
"43.59375_-116.78125", "43.59375_-116.78125", "43.59375_-116.78125", 
"43.59375_-116.78125"), location_NN_1 = c("41.15625_-90.65625", 
"43.34375_-78.15625", "41.34375_-90.78125", "43.53125_-116.78125", 
"41.34375_-90.78125"), location_NN_2 = c("41.21875_-90.65625", 
"43.34375_-78.21875", "41.21875_-90.65625", "41.34375_-90.78125", 
"41.96875_-86.21875"), location_NN_3 = c("41.15625_-90.65625", 
"43.28125_-78.15625", "41.53125_-73.96875", "41.71875_-74.15625", 
"41.21875_-90.65625")), class = "data.frame", row.names = c(NA, 
-5L))

Upvotes: 3

thc
thc

Reputation: 9705

Simple base R solution:

B[-(1:2)][A[-(1:2)]>=2] <- NA

Select all columns except first and second B[-(1:2)].

Then use the vectorized logical expression A[-(1:2)]>=2 to set the correct elements to NA.

Results:

  year            location       location_NN_1      location_NN_2      location_NN_3
1 2076 43.59375_-116.78125  41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
2 2077 43.59375_-116.78125  43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
3 2078 43.59375_-116.78125  41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
4 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125               <NA>
5 2080 43.59375_-116.78125                <NA>               <NA>               <NA>

Upvotes: 3

divibisan
divibisan

Reputation: 12155

We can use base R subsetting to identify the appropriate cells of B and replace them with NA. This approach requires the order of the columns in A and B to be the same.

We can use a simple conditional statement on dfa to find the cells with a sigma value not less than 2. Since we don't want to apply the conditional to the year and condition columns, we subset them out before applying the conditional:

!(dfa[-c(1,2)] < 2)
     sigma_NN_1 sigma_NN_2 sigma_NN_3
[1,]      FALSE      FALSE      FALSE
[2,]      FALSE      FALSE      FALSE
[3,]      FALSE      FALSE      FALSE
[4,]      FALSE      FALSE       TRUE
[5,]       TRUE       TRUE       TRUE

This returns a logical matrix we can use to subset B to replace values. What happens here is we subset B twice: first we ignore the year and location columns to get just the location columns, then we use the earlier conditional to select the rows where the matching sigma value is not less than 2 and insert NA into it:

dfb[-c(1,2)][!(dfa[-c(1,2)] < 2)] <- NA
dfb

  year            location       location_NN_1      location_NN_2      location_NN_3
1 2076 43.59375_-116.78125  41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
2 2077 43.59375_-116.78125  43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
3 2078 43.59375_-116.78125  41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
4 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125               <NA>
5 2080 43.59375_-116.78125                <NA>               <NA>               <NA>

Upvotes: 4

Related Questions