Reputation: 21
I have a data frame called test_data
:
date test score
2015-10-26 test1 4.00
2015-10-26 test2 4.99
2015-10-26 test3 15.10
2015-10-26 this continues to test23
2016-05-01 test1 4.20
2016-05-01 test2 5.50
2016-05-01 test3 16.2
2016-05-01 this continues to test23
I have a data frame that contains a high and low threshold for each test:
test low high
test1 3.0 6.0
test2 6.0 8.0
test3 12.0 18.0
...this continues to test23
I am trying to evaluate test_data and determine if the score is outside of the threshold range. For example, test1 on 2015-10-26 is not outside the threshold range. However, test2 on 2015-10-26 is outside the threshold range.
Here is the brute force method:
test1_grp <- filter(test_data, test == 'test1')
test1_grp <- mutate(test1_grp, out_thresh = if else((test1_grp$score > thresh[thresh$test == "test1", 3]) | (test1_grp$score < thresh[thresh$test == "test1", 2]), 'yes', 'no'))
test2_grp <- filter(test_data, test == 'test2')
test2_grp <- mutate(test2_grp, out_thresh = if else((test2_grp$score > thresh[thresh$test == "test2", 3]) | (test2_grp$score < thresh[thresh$test == "test2", 2]), 'yes', 'no'))
and so on, and so on to test23. There has to be a more efficient way.
I tried to use group_by but can't figure out how to assign the high and low thresholds for each group:
test_data %>% dplyr::group_by(test) %>%
dplyr::mutate(out_thresh = ifelse((score > thresh[thresh$test == "test1",3]) | (score < thresh[thresh$test == "test1", 2]),
'yes', 'no'))
Data (thanks @akrun)
test_data <- structure(list(date = c("2015-10-26", "2015-10-26", "2015-10-26",
"2016-05-01", "2016-05-01", "2016-05-01"), test = c("test1",
"test2", "test3", "test1", "test2", "test3"), score = c(4, 4.99,
15.1, 4.2, 5.5, 16.2)), .Names = c("date", "test", "score"),
class = "data.frame", row.names = c(NA, -6L))
threshold <- structure(list(test = c("test1", "test2", "test3"), low = c(3,
6, 12), high = c(6, 8, 18)), .Names = c("test", "low", "high"
), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 1
Views: 73
Reputation: 886938
We can use left_join
library(dplyr)
left_join(test_data, threshold, by = "test") %>%
mutate(grp = score >low & score < high)
# date test score low high grp
#1 2015-10-26 test1 4.00 3 6 TRUE
#2 2015-10-26 test2 4.99 6 8 FALSE
#3 2015-10-26 test3 15.10 12 18 TRUE
#4 2016-05-01 test1 4.20 3 6 TRUE
#5 2016-05-01 test2 5.50 6 8 FALSE
#6 2016-05-01 test3 16.20 12 18 TRUE
This can be also done using match
from base R
i1 <- match(test_data$test, threshold$test)
with(threshold, test_data$score > low[i1] & test_data$score < high[i1])
#[1] TRUE FALSE TRUE TRUE FALSE TRUE
NOTE: Here I am using TRUE/FALSE
instead of yes/no
as it much easier to manipulate
Upvotes: 1
Reputation: 10761
I think what you'd want to do is merge
the two data.frames
together, then use an ifelse
statement:
merged_df <- merge(test_data, threshold, by = 'test')
merged_df$ThresholdFlag <- with(merged_df, ifelse(score > low & score < high, 'Yes','No'))
test date score low high ThresholdFlag
1 test1 2015-10-26 4.00 3 6 Yes
2 test1 2016-05-01 4.20 3 6 Yes
3 test2 2015-10-26 4.99 6 8 No
4 test2 2016-05-01 5.50 6 8 No
5 test3 2015-10-26 15.10 12 18 Yes
6 test3 2016-05-01 16.20 12 18 Yes
There's also a way to do this using the data.table
package:
library(data.table)
setDT(test_data)
setkey(test_data, test)
setDT(threshold)
setkey(threshold, test)
test_dt <- test_data[threshold][,ThresholdFlag := (score > low & score < high)]
date test score low high ThresholdFlag
1: 2015-10-26 test1 4.00 3 6 TRUE
2: 2016-05-01 test1 4.20 3 6 TRUE
3: 2015-10-26 test2 4.99 6 8 FALSE
4: 2016-05-01 test2 5.50 6 8 FALSE
5: 2015-10-26 test3 15.10 12 18 TRUE
6: 2016-05-01 test3 16.20 12 18 TRUE
Upvotes: 1