Reputation: 294
I am trying to recode factor levels into numbers using mutate function, but I want to ignore alphabetical order the factors are appearing in. There are multiple same values of factor levels and I want them to be assigned the number in the new column of the row in which they first appeared in the dataframe. Example:
library(stringi)
set.seed(234)
data<-stri_rand_strings(20,1)
data<-as.data.frame(data)
data2<-data %>% mutate(num=(as.numeric(factor(data))))
data2
Expected outcome:
dat<-data2[,-2]
order<-c(1,2,3,2,4,5)
expected_result<-cbind.data.frame(head(dat), order)
expected_result
Upvotes: 1
Views: 360
Reputation: 30474
I think you can just create a new factor
and set the levels
as unique
values of data2$data
in your example:
new_fac <- factor(data2$data, levels = unique(data2$data))
The numeric values can be obtained:
new_order <- as.numeric(new_fac)
And this is what your final result would look like:
head(data.frame(new_fac, new_order))
new_fac new_order
1 k 1
2 m 2
3 1 3
4 m 2
5 4 4
6 d 5
Or in your example with dplyr
, you can do:
data %>%
mutate(num = as.numeric(factor(data, levels = unique(data))))
Upvotes: 2
Reputation: 3083
You could accomplish this with a helper table that contains the row number of the first time a string appears in your table. I.e.
library(stringi)
library(tidyverse)
# generate data
data<-stri_rand_strings(20,1)
data<-as.data.frame(data)
Create helper table:
factorlevels <- data %>% unique() %>% mutate(order = row_number())
... and inner join to data
data %>% inner_join(factorlevels)
Output:
> data %>% inner_join(factorlevels)
Joining, by = "data"
data order
1 k 1
2 m 2
3 1 3
4 m 2
5 4 4
6 d 5
7 v 6
8 i 7
9 v 6
10 H 8
11 Y 9
12 X 10
13 a 11
14 a 11
15 0 12
16 R 13
17 J 14
18 j 15
19 8 16
20 s 17
I am sure that there is a one-liner approach to this, but I could not figure it out right away.
Upvotes: 1