3DPR
3DPR

Reputation: 21

How do you add a new column to a data frame using two different variables in R?

I have a data frame called test_data:

date         test     score  
2015-10-26   test1      4.00  
2015-10-26   test2      4.99  
2015-10-26   test3      15.10  
2015-10-26   this continues to test23  
2016-05-01   test1      4.20  
2016-05-01   test2      5.50  
2016-05-01   test3      16.2  
2016-05-01   this continues to test23

I have a data frame that contains a high and low threshold for each test:

test     low     high  
test1    3.0      6.0  
test2    6.0      8.0  
test3    12.0     18.0  

...this continues to test23

I am trying to evaluate test_data and determine if the score is outside of the threshold range. For example, test1 on 2015-10-26 is not outside the threshold range. However, test2 on 2015-10-26 is outside the threshold range.

Here is the brute force method:

test1_grp <- filter(test_data, test == 'test1')  
test1_grp <- mutate(test1_grp, out_thresh = if else((test1_grp$score > thresh[thresh$test == "test1", 3]) | (test1_grp$score < thresh[thresh$test == "test1", 2]), 'yes', 'no'))  
test2_grp <- filter(test_data, test == 'test2')  
test2_grp <- mutate(test2_grp, out_thresh = if else((test2_grp$score > thresh[thresh$test == "test2", 3]) | (test2_grp$score < thresh[thresh$test == "test2", 2]), 'yes', 'no'))  

and so on, and so on to test23. There has to be a more efficient way.

I tried to use group_by but can't figure out how to assign the high and low thresholds for each group:

test_data %>% dplyr::group_by(test) %>% 
dplyr::mutate(out_thresh = ifelse((score > thresh[thresh$test == "test1",3]) | (score < thresh[thresh$test == "test1", 2]), 
'yes', 'no'))  

Data (thanks @akrun)

test_data <- structure(list(date = c("2015-10-26", "2015-10-26", "2015-10-26", 
"2016-05-01", "2016-05-01", "2016-05-01"), test = c("test1", 
"test2", "test3", "test1", "test2", "test3"), score = c(4, 4.99, 
15.1, 4.2, 5.5, 16.2)), .Names = c("date", "test", "score"), 
class = "data.frame", row.names = c(NA, -6L))

threshold <- structure(list(test = c("test1", "test2", "test3"), low = c(3, 
 6, 12), high = c(6, 8, 18)), .Names = c("test", "low", "high"
), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 1

Views: 73

Answers (2)

akrun
akrun

Reputation: 886938

We can use left_join

library(dplyr)
left_join(test_data, threshold, by = "test") %>%
             mutate(grp = score >low & score < high)
#        date  test score low high   grp
#1 2015-10-26 test1  4.00   3    6  TRUE
#2 2015-10-26 test2  4.99   6    8 FALSE
#3 2015-10-26 test3 15.10  12   18  TRUE
#4 2016-05-01 test1  4.20   3    6  TRUE
#5 2016-05-01 test2  5.50   6    8 FALSE
#6 2016-05-01 test3 16.20  12   18  TRUE

This can be also done using match from base R

i1 <- match(test_data$test, threshold$test)
with(threshold, test_data$score > low[i1] & test_data$score < high[i1])
#[1]  TRUE FALSE  TRUE  TRUE FALSE  TRUE

NOTE: Here I am using TRUE/FALSE instead of yes/no as it much easier to manipulate

Upvotes: 1

bouncyball
bouncyball

Reputation: 10761

I think what you'd want to do is merge the two data.frames together, then use an ifelse statement:

merged_df <- merge(test_data, threshold, by = 'test')
merged_df$ThresholdFlag <- with(merged_df, ifelse(score > low & score < high, 'Yes','No'))

   test       date score low high ThresholdFlag
1 test1 2015-10-26  4.00   3    6           Yes
2 test1 2016-05-01  4.20   3    6           Yes
3 test2 2015-10-26  4.99   6    8            No
4 test2 2016-05-01  5.50   6    8            No
5 test3 2015-10-26 15.10  12   18           Yes
6 test3 2016-05-01 16.20  12   18           Yes

There's also a way to do this using the data.table package:

library(data.table)
setDT(test_data)
setkey(test_data, test)
setDT(threshold)
setkey(threshold, test)

test_dt <- test_data[threshold][,ThresholdFlag := (score > low & score < high)]
         date  test score low high ThresholdFlag
1: 2015-10-26 test1  4.00   3    6          TRUE
2: 2016-05-01 test1  4.20   3    6          TRUE
3: 2015-10-26 test2  4.99   6    8         FALSE
4: 2016-05-01 test2  5.50   6    8         FALSE
5: 2015-10-26 test3 15.10  12   18          TRUE
6: 2016-05-01 test3 16.20  12   18          TRUE

Upvotes: 1

Related Questions