procerus
procerus

Reputation: 294

mutate string into numeric, ignore alphabetical order of factor

I am trying to recode factor levels into numbers using mutate function, but I want to ignore alphabetical order the factors are appearing in. There are multiple same values of factor levels and I want them to be assigned the number in the new column of the row in which they first appeared in the dataframe. Example:

library(stringi)
set.seed(234)


data<-stri_rand_strings(20,1)
data<-as.data.frame(data)
data2<-data %>% mutate(num=(as.numeric(factor(data))))
data2

Expected outcome:

dat<-data2[,-2]
order<-c(1,2,3,2,4,5)
expected_result<-cbind.data.frame(head(dat), order)  
expected_result

Upvotes: 1

Views: 360

Answers (2)

Ben
Ben

Reputation: 30474

I think you can just create a new factor and set the levels as unique values of data2$data in your example:

new_fac <- factor(data2$data, levels = unique(data2$data))

The numeric values can be obtained:

new_order <- as.numeric(new_fac)

And this is what your final result would look like:

head(data.frame(new_fac, new_order))

  new_fac new_order
1       k         1
2       m         2
3       1         3
4       m         2
5       4         4
6       d         5

Or in your example with dplyr, you can do:

data %>%
  mutate(num = as.numeric(factor(data, levels = unique(data))))

Upvotes: 2

Otto K&#228;ssi
Otto K&#228;ssi

Reputation: 3083

You could accomplish this with a helper table that contains the row number of the first time a string appears in your table. I.e.

library(stringi)
library(tidyverse)

# generate data 
data<-stri_rand_strings(20,1)
data<-as.data.frame(data)

Create helper table:

factorlevels <- data %>% unique() %>% mutate(order = row_number())

... and inner join to data

data %>% inner_join(factorlevels) 

Output:

> data %>% inner_join(factorlevels)
Joining, by = "data"
   data order
1     k     1
2     m     2
3     1     3
4     m     2
5     4     4
6     d     5
7     v     6
8     i     7
9     v     6
10    H     8
11    Y     9
12    X    10
13    a    11
14    a    11
15    0    12
16    R    13
17    J    14
18    j    15
19    8    16
20    s    17

I am sure that there is a one-liner approach to this, but I could not figure it out right away.

Upvotes: 1

Related Questions