SuperString
SuperString

Reputation: 22529

Refactoring hashStrings to ints in R

I have a dataframe that looks something like this

column1     column2
asdf        qwer
fghj        qwer
asdf        mkop 
fghj        mkop
yuio        lops

As you can see, the string values don't mean anything, I only care if their hash strings are the same. How can I refactor it so it looks something like this?

column1     column2
1           1
2           1
1           2 
2           2
3           3

Upvotes: 0

Views: 46

Answers (2)

LyzandeR
LyzandeR

Reputation: 37889

You say that those columns are in a dataframe. I assume that these should be factors. If not it is easy to turn them into factors using the as.factor() function. After that you convert them into a numeric field and you have what you want! For example:

column1 <- c('asdf','bjel','cdea','asdf','asdf','bjel')
df <- data.frame(column1)
df$column1 <- as.factor(df[['column1']]) #use this first if you column is type character
df$column1 <- as.numeric(df[['column1']])

> str(df)
'data.frame':   6 obs. of  1 variable:
 $ column1: num  1 2 3 1 1 2

Upvotes: 1

Ben Bolker
Ben Bolker

Reputation: 226871

It's pretty easy since the underlying structure of a factor in R (which is how your strings will be stored by default) is just the numeric codes plus a set of "levels" (labels).

dd <- read.table(header=TRUE,text="
column1     column2
asdf        qwer
fghj        qwer
asdf        mkop 
fghj        mkop
yuio        lops
")

dd[] <- lapply(dd,as.numeric)

if you want to replace your original data set, otherwise

dd2 <- as.data.frame(lapply(dd,as.numeric))

Upvotes: 1

Related Questions