Tahnoon Pasha
Tahnoon Pasha

Reputation: 6018

R create a unique key from a set of string data

Is there a quick algorithm/function to convert a string into an integer in R

I have a dataframe looks like

id_1 id_2 id_3 date        value
1     2    3   2012-11-18   50
1     1    4   2012-05-07   100

and

strtoi(paste(df[,1],df[,3],df[,4],sep='_') gives me a NA

Trying to set up a unique primary key I can use to do some basic arithmetic

Thanks

Upvotes: 3

Views: 4010

Answers (3)

agstudy
agstudy

Reputation: 121608

Another option to create a unique key per row is to use interaction, for example :

 transform(dat,id =interaction(dat))

 id_1 id_2 id_3       date value                   id
1    1    2    3 2012-11-18    50  1.2.3.2012-11-18.50
2    1    1    4 2012-05-07   100 1.1.4.2012-05-07.100

EDIT

The default behvior is to retain all factor levels. It is better here to use drop = TRUE , so unused factor levels are dropped from the result.

  transform(dat,id =interaction(dat,drop=TRUE))

     id_1 id_2 id_3       date value                   id
    1    1    2    3 2012-11-18    50  1.2.3.2012-11-18.50
    2    1    1    4 2012-05-07   100 1.1.4.2012-05-07.100

Upvotes: 4

Ricardo Saporta
Ricardo Saporta

Reputation: 55420

digest as @lokheart pointed out is great.

another option is to simply use factors. factors are numbers too. You get their numeric value by coercing via as.numeric.

 kvpairs <- factor(apply(X, 1, paste, collapse=""))

Now you have a pairing between the levels (the concat'd row strings) and the underlying numeric value.

# the numeric key of the first value
> as.numeric(kvpairs)[[1]]
[1] 2

# the value of key==2
> levels(kvpairs)[2]
[1] "1232012-11-18 50"


> kvpairs
[1] 1232012-11-18 50 1142012-05-07100
Levels: 1142012-05-07100 1232012-11-18 50

Note that if you add a duplicate row, it will have the same level (when concatenated).

Upvotes: 5

lokheart
lokheart

Reputation: 24675

use digest package

library(digest)
temp <- data.frame(x1=c(1:5,1),x2=c(2:6,2),stringsAsFactors=FALSE)
temp <- data.frame(temp, uid = apply(temp, 1, digest),stringsAsFactors=FALSE))

Upvotes: 6

Related Questions