rais
rais

Reputation: 103

Selecting a large range of values to recode in R

I want to recode a large number of variables across multiple columns. Here is an example df

df <- data.frame(
  id_number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
  age = c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19),
  abc1 = c(501, 502, 503, 504, 505, 506, 507, 508, 509, 510),
  abc2 = c(501, 502, 501, 501, 502, 501, 502, 503, 501, 502),
  abc3 = c(501, 506, 501, 501, 510, 501, 510, 501, 501, 501),
  abc4 = c(507, 505, 501, 501, 501, 501, 501, 501, 501, 501)
)

df

The columns abc1:abc4 has values 501:510 and I am trying to recode 501:508 as 91, 509 as 92 and 510 as 93 across all these columns at once. Here is what I tried -

library(dplyr)
df1 <- 
  df %>%
  mutate(across(
    abc1:abc4,
    ~ recode(
      .x,
      `501:508` = 91L,
      `509` = 92L,
      `510` = 93L
          )
  ))

But I get an error

x NAs introduced by coercion ℹ Input ..1 is across(abc1:abc4, ~recode(.x, `501:508` = 91L, `509` = 92L, `510` = 93L)).NAs introduced by coercionProblem with mutate() input ..1. x Unreplaced values treated as NA as .x is not compatible. Please specify replacements exhaustively or supply .default

However, it works if I change the values one by one but I want to do it all at once as my real data has a very long list of values. Am I doing something wrong with the part

`501:508` = 91L,

Thank you !

Follow-up question from above

Let's say the value of abc1:abc4 is way larger and there is an additional set of value of range 1-175. abc1:abc4 = c(1:175, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510). I just tweaked the values from my previous example to illustrate this here.

df1 <- data.frame(
  id_number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
  age = c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19),
  abc1 = c(14, 158, 170, 504, 505, 506, 507, 508, 509, 510),
  abc2 = c(501, 502, 501, 501, 45, 501, 502, 59, 501, 100),
  abc3 = c(89, 506, 12, 501, 510, 13, 510, 501, 11, 501),
  abc4 = c(32, 505, 35, 501, 501, 56, 501, 12, 501, 501)
)

df1

Now I want to recode them all together where 1:175 = 90, 501:508 = 91, 509 = 92 and 510 - 93 across columns abc1:abc4 at one go. How would "nm1" in @akrun's answer be set here. and is there an easier way to do this? Thank you !

Upvotes: 3

Views: 281

Answers (2)

akrun
akrun

Reputation: 887213

An easier option is to match with a named vector

library(dplyr)
nm1 <- setNames(rep(c(91, 92, 93), c(8, 1, 1)), 501:510)
df1 <- df %>%
     mutate(across(abc1:abc4, ~  nm1[as.character(.x)]))

-output

df1
   id_number age abc1 abc2 abc3 abc4
1          1  10   91   91   91   91
2          2  11   91   91   91   91
3          3  12   91   91   91   91
4          4  13   91   91   91   91
5          5  14   91   91   93   91
6          6  15   91   91   91   91
7          7  16   91   91   93   91
8          8  17   91   91   91   91
9          9  18   92   91   91   91
10        10  19   93   91   91   91

The use of named vectors works with recode as well

df %>% 
   mutate(across(abc1:abc4, ~ recode(., !!!  nm1)))

-output

   id_number age abc1 abc2 abc3 abc4
1          1  10   91   91   91   91
2          2  11   91   91   91   91
3          3  12   91   91   91   91
4          4  13   91   91   91   91
5          5  14   91   91   93   91
6          6  15   91   91   91   91
7          7  16   91   91   93   91
8          8  17   91   91   91   91
9          9  18   92   91   91   91
10        10  19   93   91   91   91

For the updated case, we could either extend the named vector with rep

nm2 <- setNames(rep(c(90, 91, 92, 93), c(175, 8, 1, 1)), c(1:175, 501:510))
df1 %>%
      mutate(across(abc1:abc4, ~  nm2[as.character(.x)]))
   id_number age abc1 abc2 abc3 abc4
1          1  10   90   91   90   90
2          2  11   90   91   91   91
3          3  12   90   91   90   90
4          4  13   91   91   91   91
5          5  14   91   90   93   91
6          6  15   91   91   90   90
7          7  16   91   91   93   91
8          8  17   91   90   91   90
9          9  18   92   91   90   91
10        10  19   93   90   91   91

or use the same vector and then create a condition with case_when

df1 %>% 
   mutate(across(abc1:abc4, ~ case_when(. %in% 1:175 ~ 90, 
       TRUE ~ nm1[as.character(.)])))
    id_number age abc1 abc2 abc3 abc4
1          1  10   90   91   90   90
2          2  11   90   91   91   91
3          3  12   90   91   90   90
4          4  13   91   91   91   91
5          5  14   91   90   93   91
6          6  15   91   91   90   90
7          7  16   91   91   93   91
8          8  17   91   90   91   90
9          9  18   92   91   90   91
10        10  19   93   90   91   91

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 101688

Another dplyr option

> df %>%
+   mutate(across(abc1:abc4, ~ 90 + as.integer(cut(., c(500, 508, 509, 510)))))
   id_number age abc1 abc2 abc3 abc4
1          1  10   91   91   91   91
2          2  11   91   91   91   91
3          3  12   91   91   91   91
4          4  13   91   91   91   91
5          5  14   91   91   93   91
6          6  15   91   91   91   91
7          7  16   91   91   93   91
8          8  17   91   91   91   91
9          9  18   92   91   91   91
10        10  19   93   91   91   91

A base R option following a similar idea as above

idx <- startsWith(names(df), "abc")
df[idx] <- 90 + as.integer(cut(unlist(df[idx]), c(500, 508, 509, 510)))

gives

> df
   id_number age abc1 abc2 abc3 abc4
1          1  10   91   91   91   91
2          2  11   91   91   91   91
3          3  12   91   91   91   91
4          4  13   91   91   91   91
5          5  14   91   91   93   91
6          6  15   91   91   91   91
7          7  16   91   91   93   91
8          8  17   91   91   91   91
9          9  18   92   91   91   91
10        10  19   93   91   91   91

Upvotes: 2

Related Questions