Reputation: 3200
My question involves how to generate a dummy-variable from a character variable with multiple repeated characters in R. The number of times that a certain character is repeated varies. There are several questions about this topic, but none of them seem to address my specific problem. Below is a minimal example of the data:
df <- data.frame(ID=c("C/004","C/004","C/005","C/005","C/005","C/007",
"C/007", "C/007"))
The result I expect is as follows:
> df
ID newID
1 C/004 1
2 C/004 1
3 C/005 2
4 C/005 2
5 C/005 2
6 C/007 3
7 C/007 3
8 C/007 3
I would like to have the resulting variable newID
as of numeric class and not a factor and so I would not go for the function factor(.., levels=...)
since it results into a factor variable and besides I would be required to supply factor levels which are too many.
Any assistance would be greatly appreciated.
Upvotes: 1
Views: 1484
Reputation: 55360
All factors
are numerics underneath. Therefore, if you want a numeric, simply convert
df$newID <- as.numeric(factor(df$ID))
Upvotes: 0
Reputation: 887391
You can do this in a couple of ways
match(df$ID, unique(df$ID))
#[1] 1 1 2 2 2 3 3 3
Or
as.numeric(factor(df$ID))
#[1] 1 1 2 2 2 3 3 3
Or
cumsum(!duplicated(df$ID))
#[1] 1 1 2 2 2 3 3 3
Upvotes: 1