Reputation: 103
I want to recode a large number of variables across multiple columns. Here is an example df
df <- data.frame(
id_number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
age = c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19),
abc1 = c(501, 502, 503, 504, 505, 506, 507, 508, 509, 510),
abc2 = c(501, 502, 501, 501, 502, 501, 502, 503, 501, 502),
abc3 = c(501, 506, 501, 501, 510, 501, 510, 501, 501, 501),
abc4 = c(507, 505, 501, 501, 501, 501, 501, 501, 501, 501)
)
df
The columns abc1:abc4 has values 501:510 and I am trying to recode 501:508 as 91, 509 as 92 and 510 as 93 across all these columns at once. Here is what I tried -
library(dplyr)
df1 <-
df %>%
mutate(across(
abc1:abc4,
~ recode(
.x,
`501:508` = 91L,
`509` = 92L,
`510` = 93L
)
))
But I get an error
x NAs introduced by coercion
ℹ Input ..1
is across(abc1:abc4, ~recode(.x, `501:508` = 91L, `509` = 92L, `510` = 93L))
.NAs introduced by coercionProblem with mutate()
input ..1
.
x Unreplaced values treated as NA as .x is not compatible. Please specify replacements exhaustively or supply .default
However, it works if I change the values one by one but I want to do it all at once as my real data has a very long list of values. Am I doing something wrong with the part
`501:508` = 91L,
Thank you !
Follow-up question from above
Let's say the value of abc1:abc4 is way larger and there is an additional set of value of range 1-175. abc1:abc4 = c(1:175, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510). I just tweaked the values from my previous example to illustrate this here.
df1 <- data.frame(
id_number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
age = c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19),
abc1 = c(14, 158, 170, 504, 505, 506, 507, 508, 509, 510),
abc2 = c(501, 502, 501, 501, 45, 501, 502, 59, 501, 100),
abc3 = c(89, 506, 12, 501, 510, 13, 510, 501, 11, 501),
abc4 = c(32, 505, 35, 501, 501, 56, 501, 12, 501, 501)
)
df1
Now I want to recode them all together where 1:175 = 90, 501:508 = 91, 509 = 92 and 510 - 93 across columns abc1:abc4 at one go. How would "nm1" in @akrun's answer be set here. and is there an easier way to do this? Thank you !
Upvotes: 3
Views: 281
Reputation: 887213
An easier option is to match with a named vector
library(dplyr)
nm1 <- setNames(rep(c(91, 92, 93), c(8, 1, 1)), 501:510)
df1 <- df %>%
mutate(across(abc1:abc4, ~ nm1[as.character(.x)]))
-output
df1
id_number age abc1 abc2 abc3 abc4
1 1 10 91 91 91 91
2 2 11 91 91 91 91
3 3 12 91 91 91 91
4 4 13 91 91 91 91
5 5 14 91 91 93 91
6 6 15 91 91 91 91
7 7 16 91 91 93 91
8 8 17 91 91 91 91
9 9 18 92 91 91 91
10 10 19 93 91 91 91
The use of named vectors works with recode
as well
df %>%
mutate(across(abc1:abc4, ~ recode(., !!! nm1)))
-output
id_number age abc1 abc2 abc3 abc4
1 1 10 91 91 91 91
2 2 11 91 91 91 91
3 3 12 91 91 91 91
4 4 13 91 91 91 91
5 5 14 91 91 93 91
6 6 15 91 91 91 91
7 7 16 91 91 93 91
8 8 17 91 91 91 91
9 9 18 92 91 91 91
10 10 19 93 91 91 91
For the updated case, we could either extend the named vector with rep
nm2 <- setNames(rep(c(90, 91, 92, 93), c(175, 8, 1, 1)), c(1:175, 501:510))
df1 %>%
mutate(across(abc1:abc4, ~ nm2[as.character(.x)]))
id_number age abc1 abc2 abc3 abc4
1 1 10 90 91 90 90
2 2 11 90 91 91 91
3 3 12 90 91 90 90
4 4 13 91 91 91 91
5 5 14 91 90 93 91
6 6 15 91 91 90 90
7 7 16 91 91 93 91
8 8 17 91 90 91 90
9 9 18 92 91 90 91
10 10 19 93 90 91 91
or use the same vector and then create a condition with case_when
df1 %>%
mutate(across(abc1:abc4, ~ case_when(. %in% 1:175 ~ 90,
TRUE ~ nm1[as.character(.)])))
id_number age abc1 abc2 abc3 abc4
1 1 10 90 91 90 90
2 2 11 90 91 91 91
3 3 12 90 91 90 90
4 4 13 91 91 91 91
5 5 14 91 90 93 91
6 6 15 91 91 90 90
7 7 16 91 91 93 91
8 8 17 91 90 91 90
9 9 18 92 91 90 91
10 10 19 93 90 91 91
Upvotes: 2
Reputation: 101688
Another dplyr
option
> df %>%
+ mutate(across(abc1:abc4, ~ 90 + as.integer(cut(., c(500, 508, 509, 510)))))
id_number age abc1 abc2 abc3 abc4
1 1 10 91 91 91 91
2 2 11 91 91 91 91
3 3 12 91 91 91 91
4 4 13 91 91 91 91
5 5 14 91 91 93 91
6 6 15 91 91 91 91
7 7 16 91 91 93 91
8 8 17 91 91 91 91
9 9 18 92 91 91 91
10 10 19 93 91 91 91
A base R option following a similar idea as above
idx <- startsWith(names(df), "abc")
df[idx] <- 90 + as.integer(cut(unlist(df[idx]), c(500, 508, 509, 510)))
gives
> df
id_number age abc1 abc2 abc3 abc4
1 1 10 91 91 91 91
2 2 11 91 91 91 91
3 3 12 91 91 91 91
4 4 13 91 91 91 91
5 5 14 91 91 93 91
6 6 15 91 91 91 91
7 7 16 91 91 93 91
8 8 17 91 91 91 91
9 9 18 92 91 91 91
10 10 19 93 91 91 91
Upvotes: 2