koteletje
koteletje

Reputation: 679

Converting numeric values to factor levels with factor levels assigned on the basis of the numeric ordering

Consider the data frame

a = c(0, 1, 3, 5, 6, 0, 1, 3, 6, 12)
b = c(letters[5:9], letters[2:6])
c = data.frame(var1 = a, var2 = b)

I want to convert all values in the data frame to consecutive integers factor levels starting from 1 and use these as numeric values to compute something (in reality I don't do this for the letters but I just added these to explain my problem ;) ).

With some help (Converting numeric values of multiple columns to factor levels that are consecutive integers in (descending) order), I did this through:

c[] = lapply(c, function(x) {levels(x) <- 1:length(unique(x)); x})

Unfortunately, this only replaces the values with their respective factor levels for the character column var2 but not the for the numeric column var1 (notice the 0 in column var1)

> c
   var1 var2
1     0    4
2     1    5
3     3    6
4     5    7
...

To alleviate the problem I converted all columns to character when creating c

c = as.data.frame(sapply(data.frame(var1 = a, var2 = b), as.character))

This yields

   var1 var2
1     1    4
2     2    5
3     4    6
4     5    7
5     6    8
6     1    1
7     2    2
8     4    3
9     6    4
10    3    5

The problem here, however, is that the value 12 (c[10,'var1']) in column var1 is considered as the 3rd value (it gets assigned factor level 3 after levels 1 and 2 for values 0 and 1) rather than the last value (factor level 6 because it is the largest numeric value in var1).

Is there a way to assign factor levels on the basis of the numeric ordering at the same time replacing the numeric values by their factor levels?

Upvotes: 1

Views: 997

Answers (1)

akrun
akrun

Reputation: 886948

Based on the description, it seems like the OP wanted to change the levels to numeric values starting from 1. This can be done using match

c[] <- lapply(c, function(x) factor(match(x, sort(unique(x)))))
c
#    var1 var2
#1     1    4
#2     2    5
#3     3    6
#4     4    7
#5     5    8
#6     1    1
#7     2    2
#8     3    3
#9     5    4
#10    6    5

data

a <- c(0, 1, 3, 5, 6, 0, 1, 3, 6, 12)
b <- c(letters[5:9], letters[2:6])
c <- data.frame(var1 = a, var2 = b)

Based on the code in the comments, another option to replace str_pad is

c <- data.frame(var1 = sprintf("%02d", a), var2=b, stringsAsFactors=FALSE)

Upvotes: 2

Related Questions