Reputation: 4278
I have a data.table
with 4+ columns. The first 3 are necessary to get the data about one unique individual.
c1 c2 c3 c4
a c e other_data
a c e other_data
a c f other_data
a c f other_data
a d f other_data
b d g other_data
# (c1 = "a" AND c2 = "c" AND c3 = "e") => one individual
# (c1 = "a" AND c2 = "c" AND c3 = "f") => another individual
I'd like to compute another column which would mark each individual :
c1 c2 c3 c4 unique_individual_id
a c e other_data 1
a c e other_data 1
a c f other_data 2
a c f other_data 2
a d f other_data 3
b d g other_data 4
I would like to get a unique hash out of the content of the 3 columns.
How would I do that in code ?
Upvotes: 2
Views: 540
Reputation: 1975
Alternatively, you can paste the values of interest (for each row, you paste together the values in columns 1, 2, and 3), convert to factor and then to integer (this will return an unique ID num for your combination.
df <- data.frame(c("a", "a", "b", "c", "c", "d", "d"),
c("a", "a", "b", "c", "d", "e", "e"),
c("c", "c", "d", "d", "e", "e", "e"))
df$ID <- as.numeric(as.factor(sapply(1:nrow(df), (function(i) {paste(df[i, 1:3], collapse = "")}))))
Upvotes: 2
Reputation: 887891
We can use interaction
to create the unique index
df1$unique_individual_id <- as.integer(do.call(interaction, c(df1[-4], drop = TRUE)))
df1$unique_individual_id
#[1] 1 1 2 2 3 4
Upvotes: 3