Reputation: 773
I have a very simple problem. I have a large dataframe. And I need to replace values in a column 2 (cluster) following this schema:
1 -> 3
2 -> 5
3 -> 1
5 -> 2
> dput(head(df))
structure(list(Target = c("TRINITY_GG_100011_c0_g1_i3.mrna1",
"TRINITY_GG_100011_c0_g1_i5.mrna1", "TRINITY_GG_100011_c0_g1_i6.mrna1",
"TRINITY_GG_100011_c0_g1_i9.mrna1", "TRINITY_GG_100016_c0_g1_i1.mrna1",
"TRINITY_GG_100016_c0_g1_i2.mrna1"), cluster = c(2L, 5L, 5L,
3L, 4L, 5L), AAA = c(9L, 7L, 8L, 7L,
5L, 5L)), row.names = c(NA, 6L), class = "data.frame")
#normally I will do it like this:
df$cluster[df$cluster == 1] <- 3
The problem is that once I change 1 for 3, the next time I got to change 3 for 1 that will change it again. So I can't approach this sequentially. I need something that will use the original number and change them all at once.
Upvotes: 3
Views: 63
Reputation: 101044
A base R option using match
+ ifelse
p <- c(1,2,3,5)
q <- c(3,5,1,2)
transform(
df,
cluster = ifelse(cluster %in% q,p[match(cluster,q)],cluster)
)
gives
Target cluster AAA
1 TRINITY_GG_100011_c0_g1_i3.mrna1 5 9
2 TRINITY_GG_100011_c0_g1_i5.mrna1 2 7
3 TRINITY_GG_100011_c0_g1_i6.mrna1 2 8
4 TRINITY_GG_100011_c0_g1_i9.mrna1 1 7
5 TRINITY_GG_100016_c0_g1_i1.mrna1 4 5
6 TRINITY_GG_100016_c0_g1_i2.mrna1 2 5
Upvotes: 1
Reputation: 886938
We could use a named vector and replace
library(dplyr)
df %>%
mutate(cluster = coalesce(setNames(c(3, 5, 1, 2),
c(1, 2, 3, 5))[as.character(cluster)], cluster))
-output
# Target cluster AAA
#1 TRINITY_GG_100011_c0_g1_i3.mrna1 5 9
#2 TRINITY_GG_100011_c0_g1_i5.mrna1 2 7
#3 TRINITY_GG_100011_c0_g1_i6.mrna1 2 8
#4 TRINITY_GG_100011_c0_g1_i9.mrna1 1 7
#5 TRINITY_GG_100016_c0_g1_i1.mrna1 4 5
#6 TRINITY_GG_100016_c0_g1_i2.mrna1 2 5
One of the drawbacks is that it will return NA
for elements that are not in the named vector. Inorder to return the original vector values whereever there are NA
s returned, wrap with coalesce
so that if there is a NA
in the updated column, the corresponding value of the old vector is returned
Or can do a join with a key/value dataset
library(data.table)
setDT(df)[data.frame(cluster = c(1, 2, 3, 5), new = c(3, 5, 1, 2)),
cluster := new, on = .(cluster)]
Upvotes: 1