Emmet B
Emmet B

Reputation: 5541

R creating a new column from unique rows (hashing a column)

I have a dataframe like

id          val
3243        A
3420        B
8428        A
3420        C
9000        D

I want to create a new column based on unique ids sequentially, such that

 id        val          transformed_id
 3243         A                   1
 3420         B                   2
 8428         A                   3
 3420         C                   2
 9000         D                   4

I am really clueless about this, I looked transform and unique, can think a solution in python but cannot transform that to R.

Upvotes: 2

Views: 222

Answers (1)

akrun
akrun

Reputation: 887891

We can use match or factor.

We match the 'id' column with the unique elements of 'id' to get the numeric index.

 df1$transformed_id <- match(df1$id, unique(df1$id))

Or we convert the 'id' to factor class specifying the levels as unique values of the 'id' (in this case it should work without specifying the levels, but in general specifying the levels would be more correct) and convert to numeric.

 df1$transformed_id <- as.numeric(factor(df1$id, levels=unique(df1$id)))
 df1
 #    id val transformed_id
 #1 3243   A              1
 #2 3420   B              2
 #3 8428   A              3
 #4 3420   C              2
 #5 9000   D              4

data

df1 <- structure(list(id = c(3243L, 3420L, 8428L, 3420L, 9000L),
val = c("A", 
"B", "A", "C", "D")), .Names = c("id", "val"), class = "data.frame",
row.names = c(NA, -5L))

Upvotes: 2

Related Questions