François M.
François M.

Reputation: 4278

Get a unique hash value out of a combination of columns

I have a data.table with 4+ columns. The first 3 are necessary to get the data about one unique individual.

c1   c2   c3   c4
a    c    e    other_data
a    c    e    other_data
a    c    f    other_data
a    c    f    other_data
a    d    f    other_data
b    d    g    other_data

# (c1 = "a" AND c2 = "c" AND c3 = "e") => one individual
# (c1 = "a" AND c2 = "c" AND c3 = "f") => another individual

I'd like to compute another column which would mark each individual :

c1   c2   c3   c4           unique_individual_id
a    c    e    other_data   1
a    c    e    other_data   1
a    c    f    other_data   2
a    c    f    other_data   2 
a    d    f    other_data   3 
b    d    g    other_data   4

I would like to get a unique hash out of the content of the 3 columns.

How would I do that in code ?

Upvotes: 2

Views: 540

Answers (3)

Damiano Fantini
Damiano Fantini

Reputation: 1975

Alternatively, you can paste the values of interest (for each row, you paste together the values in columns 1, 2, and 3), convert to factor and then to integer (this will return an unique ID num for your combination.

df <- data.frame(c("a", "a", "b", "c", "c", "d", "d"), 
                 c("a", "a", "b", "c", "d", "e", "e"),
                 c("c", "c", "d", "d", "e", "e", "e"))
df$ID <- as.numeric(as.factor(sapply(1:nrow(df), (function(i) {paste(df[i, 1:3], collapse = "")}))))

Upvotes: 2

d.b
d.b

Reputation: 32558

as.numeric(as.factor(with(df, paste(c1, c2, c3))))
#[1] 1 1 2 2 3 4

Upvotes: 7

akrun
akrun

Reputation: 887891

We can use interaction to create the unique index

df1$unique_individual_id <- as.integer(do.call(interaction, c(df1[-4], drop = TRUE)))
df1$unique_individual_id
#[1] 1 1 2 2 3 4

Upvotes: 3

Related Questions