jibberyjabber
jibberyjabber

Reputation: 13

How to reconcile purrr::map with case_when

I want to classify a variable based on predefined thresholds as follows:

library(tidyverse)

df <- tibble(values = sample(1:50))


classes <- c("A","B","C","D")
upper <- c(10,19,34,50)
lower <- c(0, upper[1:length(upper)-1])

segment <- df %>% 
  mutate(
    class = case_when(
      values >= lower[1] & values < upper[1] ~ classes[1],
      values >= lower[2] & values < upper[2] ~ classes[2],
      values >= lower[3] & values < upper[3] ~ classes[3],
      values >= lower[4] & values < upper[4] ~ classes[4]
    )
  )

A new variable class is generated which takes the class names as defined in classes. At the moment case_when is hardcoded for each separate entry of classes. This is fine as long as the number of classes remains small, but if I want to increase the number of classes the hardcoding solution becomes unpractical. Is it possible to incorporate purrr::map within case_when to handle this?

Following implementation did not work:

segment <- df %>% 
  mutate(
    class = case_when(
      purrr::map(values >= lower & values < upper ~ classes)
    )
  )

Upvotes: 0

Views: 413

Answers (3)

Michael
Michael

Reputation: 5898

A non-equi in data.table is probably the fastest solution in R:

library(tidyverse)
library(data.table)
df <- tibble(values = sample(1:50))


classes <- c("A","B","C","D")
upper <- c(10,19,34,50)
lower <- c(0, upper[1:length(upper)-1])

setDT(df)

interval_lookup <- data.table(classes, upper,lower)
df[interval_lookup, classes:=classes, on=c("values >= lower","values < upper")]

df
#>     values classes
#>  1:     11       B
#>  2:     31       C
#>  3:     12       B
#>  4:      6       A
#>  5:     29       C
#>  6:     38       D
#>  7:     45       D
#>  8:     28       C
#>  9:     10       B
#> 10:      3       A
#> 11:     15       B
#> 12:     43       D
#> 13:     37       D
#> 14:     14       B
#> 15:     36       D
#> 16:     33       C
#> 17:     27       C
#> 18:      8       A
#> 19:     26       C
#> 20:     47       D
#> 21:      9       A
#> 22:     39       D
#> 23:     22       C
#> 24:     49       D
#> 25:     34       D
#> 26:     23       C
#> 27:     42       D
#> 28:      4       A
#> 29:     32       C
#> 30:     20       C
#> 31:     40       D
#> 32:     21       C
#> 33:     17       B
#> 34:     16       B
#> 35:     30       C
#> 36:     46       D
#> 37:     25       C
#> 38:     24       C
#> 39:      5       A
#> 40:     44       D
#> 41:     41       D
#> 42:     50    <NA>
#> 43:     18       B
#> 44:      1       A
#> 45:     48       D
#> 46:      7       A
#> 47:     19       C
#> 48:      2       A
#> 49:     35       D
#> 50:     13       B
#>     values classes

Created on 2021-01-13 by the reprex package (v0.3.0)

Upvotes: 0

akrun
akrun

Reputation: 887223

We can also use findInterval

df$class <- c("A", "B", "C", "D")[findInterval(cut$values, c(0, 10, 19, 34, 50))]

Upvotes: 0

SteveM
SteveM

Reputation: 2301

It seems like you could just use a cut function:

breaks <- c(0,10,19,34,50)
labels <- c("A","B","C","D")
df$class <- cut(df$values, breaks = breaks, labels = labels)

Upvotes: 1

Related Questions