Sandy
Sandy

Reputation: 1148

Finding if a value is within the range of other columns

I have a dataframe df which looks like this:

Input:

df <- read.table(text = 

"ID  Q1_PM Q1_TP Q1_overall  Q2_PM  Q2_LS  Q2_overall
 1   1     2     3           1       2     2       
 2   0     NA    NA          2       1     1 
 3   2     1     1           3       4     0  
 4   1     0     2           4       0     2 
 5   NA    1     NA          0       NA    0  
 6   2     0     1           1       NA    NA"   

, header = TRUE)

Desired Output:

To explain a little further, my desired output is as below:

 ID  Q1_PM Q1_TP Q1_overall  Q2_PM  Q2_LS  Q2_overall Q1_check  Q2_check
 1   1     2     3           1       2     2          "above"   "within"
 2   0     NA    NA          2       1     1           NA       "within"
 3   2     1     1           3       4     0          "within"  "below"
 4   1     0     2           4       0     2          "above"   "within"
 5   NA    1     NA          0       NA    0           NA       "within"
 6   2     0     1           1       NA    NA         "within"   NA

Explanation:

Example 1:

Based on the value in columns Q1_PM and Q1_TP, I want to see whether the value in column Q1_overall is within their range or not? If, not in range, is the value above or below the range? To track this, I want to add an additional column Q1_check.

Example 2:

Similarly, based on the values of Q2_PM and Q2_LS, I want to check if the value of Q2_overall is within their range or not? If not in range, is it above or below the range? Again, to track this, I want to add an additional column Q2_check

Requirements:

1- For this, I want to add additional columns Q1_check and Q2_check where the first column is for the comparisons that involve Q1 items and the second column is for the comparisons that involve Q2 items.

2- The columns could contain the following values: above, below and within.

3- The case when the columns named overall have NAs, then the extra columns could also have NAs.

Related posts:

I have looked for related posts such as: Add column with values depending on another column to a dataframe and Create categories by comparing a numeric column with a fixed value but I am running into errors as discussed below.

Partial Solution:

The only solution, I can think of is, along these lines:

df$Q1_check <- ifelse(data$Q1_overall < data$Q1_PM, 'below',
                        ifelse(data$Q1_overall > data$Q1_TP, 'above', 
                               ifelse(is.na(data$Q1_overall), NA, 'within')))

But it results in following error: Error in data$Q1_overall : object of type 'closure' is not subsettable. I do not understand what the possible issue could be.

OR

df %>%
  mutate(Regulation = case_when(Q1_overall < Q1_PM ~ 'below',
                                Q1_overall > Q1_TP ~ 'above', 
                                Q1_PM < Q1_overall < Q1_TP, 'within'))

This also results in error Error: unexpected '<' in: "Q1_overall > Q1_TP ~ 'above', Q1_PM < Q1_overall <"

Edit 1:

How can the solution be extended if (let's say) the columns are these:

"Q1 Comm - 01 Scope Thesis"
"Q1 Comm - 02 Scope Project"
"Q1 Comm - 03 Learn Intern"
"Q1 Comm - 04 Biography"
"Q1 Comm - 05 Exhibit"
"Q1 Comm - 06 Social Act"
"Q1 Comm - 07 Post Project"
"Q1 Comm - 08 Learn Plant"
"Q1 Comm - 09 Study Narrate"
"Q1 Comm - 10 Learn Participate"
"Q1 Comm - 11 Write 1"
"Q1 Comm - 12 Read 2"
"Q1 Comm - Overall Study Plan"

How can we identify when the column Q1 Comm - Overall Study Plan is:

1 - Below the min() of all the other columns, or

2 - Above the max() of all the other columns, or

3 - Within the range of all the other columns

Edit 2:

For the updated fields, I am also including the dput(df)

dput(df)

structure(list(ï..ID = c(10L, 31L, 225L, 243L), Q1.Comm...01.Scope.Thesis = c(NA, 
2L, 0L, NA), Q1.Comm...02.Scope.Project = c(NA, NA, NA, 2L), 
    Q1.Comm...03.Learn.Intern = c(4L, NA, NA, NA), Q1.Comm...04.Biography = c(NA, 
    NA, NA, 1L), Q1.Comm...05.Exhibit = c(4L, 2L, NA, NA), Q1.Comm...06.Social.Act = c(NA, 
    NA, NA, 3L), Q1.Comm...07.Post.Project = c(NA, NA, 3L, NA
    ), Q1.Comm...08.Learn.Plant = c(NA, NA, NA, 4L), Q1.Comm...09.Study.Narrate = c(NA, 
    NA, 0L, NA), Q1.Comm...10.Learn.Participate = c(4L, NA, NA, 
    NA), Q1.Comm...11.Write.1 = c(NA, 2L, NA, NA), Q1.Comm...12.Read.2 = c(NA, 
    NA, 1L, NA), Q1.Comm...Overall.Study.Plan = c(4L, 1L, 2L, 
    NA), X = c(NA, NA, NA, NA), X.1 = c(NA, NA, NA, NA), X.2 = c(NA, 
    NA, NA, NA)), class = "data.frame", row.names = c(NA, -4L
))

Any advice on how to achieve this would be greatly appreciated. Thank you!

Upvotes: 1

Views: 905

Answers (5)

det
det

Reputation: 5232

comparison <- function(dt, group_cols, new_col, compare_col){
  
  dt[, 
     c("min", "max") := transpose(pmap(.SD, range, na.rm = TRUE)), .SDcols = group_cols
     ][,(new_col) := fcase(
       is.na(get(compare_col)), NA_character_,
       get(compare_col) < min, "below",
       get(compare_col) > max, "above",
       default = "within"
     )
     ][]
}

group_cols <- names(df) %>%
 str_subset("^Q[0-9]+") %>%
 str_subset("overall", negate = TRUE) %>%
 split(str_extract(., "^Q[0-9]+"))

new_cols <- names(group_cols) %>% str_c("_check")
compare_cols <- names(group_cols) %>% str_c("_overall")

setDT(df)

pwalk(list(group_cols, new_cols, compare_cols), ~comparison(df, ...))
df[, c("min", "max") := NULL]

Upvotes: 1

koolmees
koolmees

Reputation: 2783

Largely based on Ronak's great solution:

df <- structure(list(ID = c(10L, 31L, 225L, 243L), 
                      `Q1 Comm - 01 Scope Thesis` = c(NA, 2L, 0L, NA), 
                      `Q1 Comm - 02 Scope Project` = c(NA, NA, NA, 2L), 
                      `Q1 Comm - 03 Learn Intern` = c(4L, NA, NA, NA), 
                      `Q1 Comm - 04 Biography` = c(NA, NA, NA, 1L), 
                      `Q1 Comm - 05 Exhibit` = c(4L, 2L, NA, NA), 
                      `Q1 Comm - 06 Social Act` = c(NA, NA, NA, 3L), 
                      `Q1 Comm - 07 Post Project` = c(NA, NA, 3L, NA), 
                      `Q1 Comm - 08 Learn Plant` = c(NA, NA, NA, 4L), 
                      `Q1 Comm - 09 Study Narrate` = c(NA, NA, 0L, NA), 
                      `Q1 Comm - 10 Learn Participate` = c(4L, NA, NA,NA), 
                      `Q1 Comm - 11 Write 1` = c(NA, 2L, NA, NA), 
                      `Q1 Comm - 12 Read 2` = c(NA, NA, 1L, NA), 
                      `Q1 Comm - Overall Study Plan` = c(4L, 1L, 2L, NA), 
                      X = c(NA, NA, NA, NA), 
                      `X 1` = c(NA, NA, NA, NA), 
                      `X 2` = c(NA, NA, NA, NA)), 
                class = "data.frame", row.names = c(NA, -4L))

library(dplyr)

comparison <- function(df, prefix) {
  df <- df[grep(prefix, colnames(df))]
  min <- apply(df[-grep("Overall", colnames(df))], 1, min, na.rm = T)
  max <- apply(df[-grep("Overall", colnames(df))], 1, max, na.rm = T)
  z <- df[grep("Overall", colnames(df))]
  case_when(is.na(z) ~ NA_character_,
            z >= min & z <= max ~ 'within', 
            z > max ~ 'above', 
            TRUE ~ 'below')
}

prefixes <- sub(" \\- Overall.*", "", colnames(df[grep("Overall", colnames(df))]))

for (i in prefixes) {
  df <- df %>%
    mutate("{i} - Check" := comparison(df, i))
}


> print(df)
   ID Q1 Comm - 01 Scope Thesis Q1 Comm - 02 Scope Project Q1 Comm - 03 Learn Intern Q1 Comm - 04 Biography
1  10                        NA                         NA                         4                     NA
2  31                         2                         NA                        NA                     NA
3 225                         0                         NA                        NA                     NA
4 243                        NA                          2                        NA                      1
  Q1 Comm - 05 Exhibit Q1 Comm - 06 Social Act Q1 Comm - 07 Post Project Q1 Comm - 08 Learn Plant
1                    4                      NA                        NA                       NA
2                    2                      NA                        NA                       NA
3                   NA                      NA                         3                       NA
4                   NA                       3                        NA                        4
  Q1 Comm - 09 Study Narrate Q1 Comm - 10 Learn Participate Q1 Comm - 11 Write 1 Q1 Comm - 12 Read 2
1                         NA                              4                   NA                  NA
2                         NA                             NA                    2                  NA
3                          0                             NA                   NA                   1
4                         NA                             NA                   NA                  NA
  Q1 Comm - Overall Study Plan  X X 1 X 2 Q1 Comm - Check
1                            4 NA  NA  NA          within
2                            1 NA  NA  NA           below
3                            2 NA  NA  NA          within
4                           NA NA  NA  NA            <NA>

Upvotes: 1

AnilGoyal
AnilGoyal

Reputation: 26218

If your columns are named similarly, you may do this for any number of Qs simultaneously.

  • changed - in column names to acceptable _
  • changed Q2_LS to Q2_TP for sake of similarity

What is does -

  • It picks up every column that ends with _overall (2 here but can be any number)
  • check this columns values as -
    • If less than column having name _PM / _TP in lieu of _overall allocates value below
    • If greater than column having name _PM/_TP in lieu of _overall allocates value above
      • To access these column values I used get alongwith cur_column and stringr string replacement function
    • if current value is NA allocated a NA_character
    • otherwise allocates value within
  • Now, for final mutated columns (all at once) it renames these by removing _overall from these columns and pasting _check instead (I used .names argument of across here)
    • For this I used stringr::str_remove inside glue argument (.names follow glue style of formula)
df <- read.table(text = 
                   
                   "ID  Q1_PM Q1_TP Q1_overall  Q2_PM  Q2_TP  Q2_overall
 1   1     2     3           1       2     2       
 2   0     NA    NA          2       1     1 
 3   2     1     1           3       4     0  
 4   1     0     2           4       0     2 
 5   NA    1     NA          0       NA    0  
 6   2     0     1           1       NA    NA"   
 
 , header = TRUE)

df
#>   ID Q1_PM Q1_TP Q1_overall Q2_PM Q2_TP Q2_overall
#> 1  1     1     2          3     1     2          2
#> 2  2     0    NA         NA     2     1          1
#> 3  3     2     1          1     3     4          0
#> 4  4     1     0          2     4     0          2
#> 5  5    NA     1         NA     0    NA          0
#> 6  6     2     0          1     1    NA         NA
library(tidyverse)
df %>% mutate(across(ends_with('overall'), ~ case_when(. < pmin(get(str_replace(cur_column(), '_overall', '_PM')),
                                                                 get(str_replace(cur_column(), '_overall', '_TP'))) ~ 'below',
                                                       . > pmax(get(str_replace(cur_column(), '_overall', '_PM')),
                                                                 get(str_replace(cur_column(), '_overall', '_TP'))) ~ 'above',
                                                       is.na(.) ~ NA_character_,
                                                       TRUE ~ 'within'),
                     .names = '{str_remove(.col,"_overall")}_check'))
#>   ID Q1_PM Q1_TP Q1_overall Q2_PM Q2_TP Q2_overall Q1_check Q2_check
#> 1  1     1     2          3     1     2          2    above   within
#> 2  2     0    NA         NA     2     1          1     <NA>   within
#> 3  3     2     1          1     3     4          0   within    below
#> 4  4     1     0          2     4     0          2    above   within
#> 5  5    NA     1         NA     0    NA          0     <NA>   within
#> 6  6     2     0          1     1    NA         NA   within     <NA>

Created on 2021-06-09 by the reprex package (v2.0.0)

Upvotes: 1

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

df <- read.table(text = 
                   
                   "ID  Q1-PM Q1-TP Q1-overall  Q2-PM  Q2-LS  Q2-overall
 1   1     2     3           1       2     2       
 2   0     NA    NA          2       1     1 
 3   2     1     1           3       4     0  
 4   1     0     2           4       0     2 
 5   NA    1     NA          0       NA    0  
 6   2     0     1           1       NA    NA"   
                 
                 , header = TRUE)

library(tidyverse)


f <- function(x, y, z){
  case_when(
    z < pmin(x, y, na.rm = TRUE) ~ "below",
    z > pmax(x, y, na.rm = TRUE) ~ "abowe",
    between(z, pmin(x, y, na.rm = TRUE), pmax(x, y, na.rm = TRUE)) ~ "within"
  )
}

df %>%
  rowwise() %>% 
  mutate(Q1_check = f(Q1.PM, Q1.TP, Q1.overall),
         Q2_check = f(Q2.PM, Q2.LS, Q2.overall))
#> # A tibble: 6 x 9
#> # Rowwise: 
#>      ID Q1.PM Q1.TP Q1.overall Q2.PM Q2.LS Q2.overall Q1_check Q2_check
#>   <int> <int> <int>      <int> <int> <int>      <int> <chr>    <chr>   
#> 1     1     1     2          3     1     2          2 abowe    within  
#> 2     2     0    NA         NA     2     1          1 <NA>     within  
#> 3     3     2     1          1     3     4          0 within   below   
#> 4     4     1     0          2     4     0          2 abowe    within  
#> 5     5    NA     1         NA     0    NA          0 <NA>     within  
#> 6     6     2     0          1     1    NA         NA within   <NA>

Created on 2021-06-09 by the reprex package (v2.0.0)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

Seems a very long winded approach -

library(dplyr)

comparison <- function(x, y, z) {
  case_when(is.na(z) ~ NA_character_,
            z >= x & z <= y | 
              z >= y & z <= x |
              is.na(x) & y == z |
              is.na(y) & x == z ~ 'within', 
            z > x & z > y ~ 'above', 
            TRUE ~ 'below')
}

df %>%
  mutate(Q1_check = comparison(Q1.PM, Q1.TP, Q1.overall), 
          Q2_check = comparison(Q2.PM, Q2.LS, Q2.overall))
  
         
#  ID Q1.PM Q1.TP Q1.overall Q2.PM Q2.LS Q2.overall Q1_check Q2_check
#1  1     1     2          3     1     2          2    above   within
#2  2     0    NA         NA     2     1          1     <NA>   within
#3  3     2     1          1     3     4          0   within    below
#4  4     1     0          2     4     0          2    above   within
#5  5    NA     1         NA     0    NA          0     <NA>   within
#6  6     2     0          1     1    NA         NA   within     <NA>

Upvotes: 1

Related Questions