Combining columns to preserve uniqueness

Question

I need to combine multiple columns together to get a single "grouping" variable as in the Paste multiple columns together thread. The problem is, I want it to be robust to similar content of the strings, e.g.

tmp1 <- data.frame(V1 = c("a", "aa", "a",  "b", "bb", "aa"),
                   V2 = c("a", "a",  "aa", "b", "b",  "a"))

tmp2 <- data.frame(V1 = c("+",  "++", "+-", "-|",  "||"),
                   V2 = c("-|", "--", "++", "|-+", "|"))

For the data as above, using function apply(x, 1, paste, collapse = sep) with some common separators like "", |, -, + would fail as it would make the columns unidentifiable in output and may lead to mixing together different kinds of columns.

The columns can be assumed to be of different types (numeric, factor, character etc.).

The expected output is a vector with one ID per row, where each ID is assigned to unique combination of values between the two columns. The actual form of the ID's is not important for me. For example,

1 2 3 4 5 2

for the tmp1 data.

Can you suggest a better way to do this? Please notice that I am concerned with performance.

Combining columns to preserve uniqueness

Answers (1)

Related Questions