Reputation: 181
I am trying to convert data so that each column is represeted by 0's, 1's, and 2's. I have a data frame with 5 populations and 6 variables (there are actually 100+ populations and 5,000+ variables in the real data frame):
pop Var1 Var2 Var3 Var4 Var5 Var6
1 Crater 11 11 22 44 11 22
2 Teton 14 44 12 34 33 22
3 Vipond Park 44 11 22 44 33 NA
4 Little Joe 11 44 NA 44 13 44
5 Rainier 14 11 11 NA 11 44
In each column, I have the following combinations of numbers: 1 and 3, 2 and 4, 2 and 3, 1 and 4, 3 and 4, 1 and 2
For each column, I need to convert one of the "doubled numbers" to a 0, the OTHER of the doubled numbers to a 2, and then those variables that are a combination of two numbers to a 1 (the intermediate value). (So, 13, 24, 23, 14, 34, and 12 should become 1.)
For example, for Var1 in the data frame above, 11 should be 0, 14 should be 1, and 44 should be 2. Some columns have only one of the doubled numbers, and then the combination of the numbers as well. There is also missing data. For example, I am trying to convert the above data frame to:
pop Var1 Var2 Var3 Var4 Var5 Var6
1 Crater 0 0 0 0 0 0
2 Teton 1 2 1 1 2 0
3 Vipond Park 2 0 0 0 2 NA
4 Little Joe 0 2 NA 0 1 2
5 Rainier 1 0 2 NA 0 2
Upvotes: 1
Views: 122
Reputation: 270248
Let u
be the unique non-NA elements in x
. is.twice
is a logical vector which is TRUE for the double digits in u
and FALSE for the non-double digits in u
. uu
is the unique double digits and other
is the remaining number or it may be zero length if there is no other number. Finally compute the labels
associated with c(uu, other)
and perform the translation of x
:
f <- function(x) {
u <- unique(na.omit(x))
# separate u into uu (double digits) and other
is.twice <- u %% 10 == u %/% 10 # true if double digit
uu <- u[is.twice]
other <- u[!is.twice]
# compute labels associated with c(uu, other)
labels <- c(0, 2)[seq_along(uu)]
if (length(other) > 0) labels <- c(labels, 1)
# translate x to appropriate labels
labels[match(x, c(uu, other))]
}
replace(DF, -1, lapply(DF[-1], f))
which for the sample data gives:
pop Var1 Var2 Var3 Var4 Var5 Var6
1 Crater 0 0 0 0 0 0
2 Teton 1 2 1 1 2 0
3 Vipond Park 2 0 0 0 2 NA
4 Little Joe 0 2 NA 0 1 2
5 Rainier 1 0 2 NA 0 2
Note: The above used this input:
DF <-
structure(list(pop = structure(c(1L, 4L, 5L, 2L, 3L), .Label = c("Crater",
"Little Joe", "Rainier", "Teton", "Vipond Park"), class = "factor"),
Var1 = c(11L, 14L, 44L, 11L, 14L), Var2 = c(11L, 44L, 11L,
44L, 11L), Var3 = c(22L, 12L, 22L, NA, 11L), Var4 = c(44L,
34L, 44L, 44L, NA), Var5 = c(11L, 33L, 33L, 13L, 11L), Var6 = c(22L,
22L, NA, 44L, 44L)), .Names = c("pop", "Var1", "Var2", "Var3",
"Var4", "Var5", "Var6"), class = "data.frame", row.names = c(NA,
-5L))
Update: Fixed.
Upvotes: 3