Oshrat
Oshrat

Reputation: 195

How to convert from category to numeric in r

Here is my problem:

I have a table with categories and I want to rank them:

category
dog
cat
fish
dog
dog

What I want is to add a column and to rank them:

category       rank    
dog             1  
cat             2
fish            3
dog             1
dog             1

Thanks!

Upvotes: 8

Views: 27265

Answers (4)

Leonardo Ferreira
Leonardo Ferreira

Reputation: 76

This worked beautifully for me:

category = as.numeric(factor(as.vector(category)))

Upvotes: 0

Amol Modi
Amol Modi

Reputation: 311

Hopefully category is a factor variable. If not, convert it to factor:

category <- as.factor(category)

You could use the relevel function to assigned level 1 to the category "dog" as follows:

levels(category) <- relevel(category, ref = "dog")

and then create a data frame using following code:

df <- data.frame(category,as.numeric(category))
colnames(df) <- c("category","rank")

as.numeric function returns the levels of the factors which is the rank in your case.

Upvotes: 0

alexis_laz
alexis_laz

Reputation: 13122

Just for the sake of completeness and because the solution I posted in a comment is an inefficient (and pretty ugly) fix, I'll post an answer too.

It turned out that OP's starting setting was something like the following:

x = c("cat", "dog", "fish", "dog", "dog", "cat", "fish", "catfish")
x = factor(x)

At the end, a manually specified numerical categorization of x was wanted. As an example, let's suppose that the following matching is wanted:

cat -> 1, dog -> 2, fish -> 3, catfish -> 4

So, some alternatives:

sapply(as.character(x), switch, "cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4, 
                                                                USE.NAMES = F)
#[1] 1 2 3 2 2 1 3 4

match(x, c("cat", "dog", "fish", "catfish")) #note that match's internal 'do_match' 
                                             #calls 'match_transform' that coerces
                                             #`factor` to `character`, so no need
                                             #for 'as.character(x)'
                                  #(http://svn.r-project.org/R/trunk/src/main/unique.c)
#[1] 1 2 3 2 2 1 3 4

local({    #just to not change 'x'
levels(x) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4)
as.numeric(x)
})
#[1] 1 2 3 2 2 1 3 4

library(fastmatch)
fmatch(x, c("cat", "dog", "fish", "catfish"))  #a faster alternative to 'match'
#[1] 1 2 3 2 2 1 3 4

And a benchmarking on a larger vector:

X = rep(as.character(x), 1e5)
X = factor(X)
f1 = function() sapply(as.character(X), switch, 
            "cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4, USE.NAMES = F)
f2 = function() match(X, c("cat", "dog", "fish", "catfish")) 
f3 = function() {levels(X) = list("cat" = 1, "dog" = 2, "fish" = 3, "catfish" = 4) ;
                                                       as.numeric(X)}
library(fastmatch)
f4 = function() fmatch(X, c("cat", "dog", "fish", "catfish"))

library(microbenchmark)
microbenchmark(f1(), f2(), f3(), f4(), times = 10)
#Unit: milliseconds
# expr         min          lq      median         uq       max neval
# f1() 1745.111666 1816.675337 1961.809102 2107.98236 2896.0291    10
# f2()   22.043657   22.786647   23.987263   31.45057  111.9600    10
# f3()   32.704779   32.919150   38.865853   47.67281  134.2988    10
# f4()    8.814958    8.823309    9.856188   19.66435  104.2827    10
sum(f1() != f2())
#[1] 0
sum(f2() != f3())
#[1] 0
sum(f3() != f4())
#[1] 0

Upvotes: 5

Roland
Roland

Reputation: 132576

I assume that if you write "ranks" you mean ranks. I further assume you want to rank according to number of occurrence.

cats <- factor(c("dog", "cat", "fish", "dog", "dog"))

#see help("rank") for other possibilities to break ties
ranks <- rank(-table(cats), ties.method="first")

DF <- data.frame(category=cats, rank=ranks[as.character(cats)])

print(DF)
#   category rank
# 1      dog    1
# 2      cat    2
# 3     fish    3
# 4      dog    1
# 5      dog    1

Upvotes: 2

Related Questions