Reputation: 22529
I have a dataframe that looks something like this
column1 column2
asdf qwer
fghj qwer
asdf mkop
fghj mkop
yuio lops
As you can see, the string values don't mean anything, I only care if their hash strings are the same. How can I refactor it so it looks something like this?
column1 column2
1 1
2 1
1 2
2 2
3 3
Upvotes: 0
Views: 46
Reputation: 37889
You say that those columns are in a dataframe. I assume that these should be factors. If not it is easy to turn them into factors using the as.factor() function. After that you convert them into a numeric field and you have what you want! For example:
column1 <- c('asdf','bjel','cdea','asdf','asdf','bjel')
df <- data.frame(column1)
df$column1 <- as.factor(df[['column1']]) #use this first if you column is type character
df$column1 <- as.numeric(df[['column1']])
> str(df)
'data.frame': 6 obs. of 1 variable:
$ column1: num 1 2 3 1 1 2
Upvotes: 1
Reputation: 226871
It's pretty easy since the underlying structure of a factor in R (which is how your strings will be stored by default) is just the numeric codes plus a set of "levels" (labels).
dd <- read.table(header=TRUE,text="
column1 column2
asdf qwer
fghj qwer
asdf mkop
fghj mkop
yuio lops
")
dd[] <- lapply(dd,as.numeric)
if you want to replace your original data set, otherwise
dd2 <- as.data.frame(lapply(dd,as.numeric))
Upvotes: 1