amo
amo

Reputation: 3200

Generate dummy variable with multiple levels in R

My question involves how to generate a dummy-variable from a character variable with multiple repeated characters in R. The number of times that a certain character is repeated varies. There are several questions about this topic, but none of them seem to address my specific problem. Below is a minimal example of the data:

df <- data.frame(ID=c("C/004","C/004","C/005","C/005","C/005","C/007", 
                    "C/007", "C/007"))

The result I expect is as follows:

  > df
         ID newID
    1 C/004     1
    2 C/004     1
    3 C/005     2
    4 C/005     2
    5 C/005     2
    6 C/007     3
    7 C/007     3
    8 C/007     3

I would like to have the resulting variable newID as of numeric class and not a factor and so I would not go for the function factor(.., levels=...)

since it results into a factor variable and besides I would be required to supply factor levels which are too many.

Any assistance would be greatly appreciated.

Upvotes: 1

Views: 1484

Answers (2)

Ricardo Saporta
Ricardo Saporta

Reputation: 55360

All factors are numerics underneath. Therefore, if you want a numeric, simply convert

  df$newID <- as.numeric(factor(df$ID))

Upvotes: 0

akrun
akrun

Reputation: 887391

You can do this in a couple of ways

match(df$ID, unique(df$ID))
#[1] 1 1 2 2 2 3 3 3

Or

as.numeric(factor(df$ID))
#[1] 1 1 2 2 2 3 3 3

Or

cumsum(!duplicated(df$ID))
#[1] 1 1 2 2 2 3 3 3

Upvotes: 1

Related Questions