user3545679
user3545679

Reputation: 181

Conditionally convert numbers in an R data frame

I am trying to convert data so that each column is represeted by 0's, 1's, and 2's. I have a data frame with 5 populations and 6 variables (there are actually 100+ populations and 5,000+ variables in the real data frame):

               pop      Var1    Var2    Var3     Var4     Var5     Var6 
1           Crater      11      11      22       44       11       22       
2            Teton      14      44      12       34       33       22       
3      Vipond Park      44      11      22       44       33       NA       
4       Little Joe      11      44      NA       44       13       44       
5          Rainier      14      11      11       NA       11       44       

In each column, I have the following combinations of numbers: 1 and 3, 2 and 4, 2 and 3, 1 and 4, 3 and 4, 1 and 2

For each column, I need to convert one of the "doubled numbers" to a 0, the OTHER of the doubled numbers to a 2, and then those variables that are a combination of two numbers to a 1 (the intermediate value). (So, 13, 24, 23, 14, 34, and 12 should become 1.)

For example, for Var1 in the data frame above, 11 should be 0, 14 should be 1, and 44 should be 2. Some columns have only one of the doubled numbers, and then the combination of the numbers as well. There is also missing data. For example, I am trying to convert the above data frame to:

               pop      Var1    Var2    Var3     Var4     Var5     Var6 
1           Crater      0       0       0        0        0        0       
2            Teton      1       2       1        1        2        0       
3      Vipond Park      2       0       0        0        2        NA       
4       Little Joe      0       2       NA       0        1        2       
5          Rainier      1       0       2        NA       0        2  

Upvotes: 1

Views: 122

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 270248

Let u be the unique non-NA elements in x. is.twice is a logical vector which is TRUE for the double digits in u and FALSE for the non-double digits in u. uu is the unique double digits and other is the remaining number or it may be zero length if there is no other number. Finally compute the labels associated with c(uu, other) and perform the translation of x:

f <- function(x) {   

   u <- unique(na.omit(x))

   # separate u into uu (double digits) and other
   is.twice <- u %% 10 == u %/% 10 # true if double digit
   uu <- u[is.twice]
   other <- u[!is.twice]

   # compute labels associated with c(uu, other)
   labels <- c(0, 2)[seq_along(uu)]
   if (length(other) > 0) labels <- c(labels, 1)

   # translate x to appropriate labels
   labels[match(x, c(uu, other))]

}

replace(DF, -1, lapply(DF[-1], f))

which for the sample data gives:

          pop Var1 Var2 Var3 Var4 Var5 Var6
1      Crater    0    0    0    0    0    0
2       Teton    1    2    1    1    2    0
3 Vipond Park    2    0    0    0    2   NA
4  Little Joe    0    2   NA    0    1    2
5     Rainier    1    0    2   NA    0    2

Note: The above used this input:

DF <- 
structure(list(pop = structure(c(1L, 4L, 5L, 2L, 3L), .Label = c("Crater", 
"Little Joe", "Rainier", "Teton", "Vipond Park"), class = "factor"), 
    Var1 = c(11L, 14L, 44L, 11L, 14L), Var2 = c(11L, 44L, 11L, 
    44L, 11L), Var3 = c(22L, 12L, 22L, NA, 11L), Var4 = c(44L, 
    34L, 44L, 44L, NA), Var5 = c(11L, 33L, 33L, 13L, 11L), Var6 = c(22L, 
    22L, NA, 44L, 44L)), .Names = c("pop", "Var1", "Var2", "Var3", 
"Var4", "Var5", "Var6"), class = "data.frame", row.names = c(NA, 
-5L))

Update: Fixed.

Upvotes: 3

Related Questions