Reputation:
I have multiple columns and I want to code them sequentially. Here is a sample of the columns:
df<-read.table(text=" A M Z X
124321 33333 123 1309
234543 12121 33 1308
130991 200EE 123 1308
130911 200EE 123 1309
124321 12121 33 1309
234543 33333 232 1309", h=T)
I want to get this table:
df1<-read.table(text=" Group1 Group2 Group3 Group4
1 6 9 12
4 5 8 11
3 7 9 11
2 7 9 12
1 5 8 12
4 6 10 12
", h=T)
I have used the following basic codes, but they are not reliable, especially when the columns are increased based on my experiences.
df$Group1 <- as.integer(as.factor(df$A))
df$Group2 <- as.integer(as.factor(df$M)) + max(df$Group1)
df$Group3 <- as.integer(as.factor(df$Z)) + max(df$Group2)
df$Group4 <- as.integer(as.factor(df$X)) + max(df$Group3)
Is there a better and more reliable solution to get my table?
Upvotes: 1
Views: 42
Reputation: 28705
You can use accumulate
library(tidyverse)
df %>%
mutate_all(~ as.integer(as.factor(.))) %>%
accumulate(~ .y + max(.x)) %>%
bind_cols %>%
rename_all(~ paste0('Group', seq_along(.)))
# # A tibble: 6 x 4
# Group1 Group2 Group3 Group4
# <int> <int> <int> <int>
# 1 1 7 9 12
# 2 4 5 8 11
# 3 3 6 9 11
# 4 2 6 9 12
# 5 1 5 8 12
# 6 4 7 10 12
The second column is different from the one you show, but based on the output below it looks like it's working as expected
df %>%
mutate_all(~ as.integer(as.factor(.)))
# A M Z X
# 1 1 3 2 2
# 2 4 1 1 1
# 3 3 2 2 1
# 4 2 2 2 2
# 5 1 1 1 2
# 6 4 3 3 2
Or, borrowing d.b's cumsum/sapply idea (should accept d.b's answer if you think this method is better)
df %>%
mutate_all(~ as.integer(as.factor(.))) %>%
map2_dfc(c(0, cumsum(sapply(., max))[-ncol(.)]), `+`)
# # A tibble: 6 x 4
# A M Z X
# <dbl> <dbl> <dbl> <dbl>
# 1 1 7 9 12
# 2 4 5 8 11
# 3 3 6 9 11
# 4 2 6 9 12
# 5 1 5 8 12
# 6 4 7 10 12
Upvotes: 1
Reputation: 32558
df2 = lapply(df, function(x) as.integer(as.factor(x)))
data.frame(Map("+", df2, cumsum(c(0, head(sapply(df2, max), -1)))))
# A M Z X
#1 1 7 9 12
#2 4 5 8 11
#3 3 6 9 11
#4 2 6 9 12
#5 1 5 8 12
#6 4 7 10 12
Upvotes: 1