R dataframe each row has unique value

Question

I would like to change a dataframe so that it only contains unique values on each row. For example, suppose I have a dataframe like this:

person1 person2 person3
1          2       NA
4          4       5 
6          NA      NA

But i want to change it so that on each row there are only unique values:

person1   person2   person3
1          NA       NA
NA         2        NA
NA         NA       NA
4          4        NA
NA         NA       5
6          NA       NA

The goal in the end is that i want to make an incidence matrix, like this:

    person1   person2   person3
1      1         0         0
2      0         1         0
3      0         0         0
4      1         1         0
5      0         0         1
6      1         0         0

Does someone have a suggestion on how to do this with R?

Joe · Accepted Answer

One way could be to allocate yourself a matrix of as many rows as the highest value in the data frame, then use a simple loop to fill it with 1s in the correct positions.

Let's call the allocated matrix output, giving it the same colnames as as the original data frame.

max.value <- max(df, na.rm=T)
output <- matrix(0, nrow = max.value, ncol=ncol(df))
colnames(output) <- colnames(df)

Now we have a 6x3 matrix of zeros. Now, a simple nested loop goes through each column of output, assigning 1s to the appropriate column positions of output as represented by i.

for (j in 1:ncol(output)) {  #for each column of the output matrix
  for (i in df[, j]) {       #for the appropriate position in the column according to df
    output[i, j] <- 1        #assign 1 to that position
  }
}

> output
     person1 person2 person3
[1,]       1       0       0
[2,]       0       1       0
[3,]       0       0       0
[4,]       1       1       0
[5,]       0       0       1
[6,]       1       0       0

Should work with as many people and rows as you need.

Addendum: here's the dput of the test data frame.

structure(list(person1 = c(1L, 4L, 6L), person2 = c(2L, 4L, NA
), person3 = c(NA, 5L, NA)), .Names = c("person1", "person2", 
"person3"), class = "data.frame", row.names = c(NA, -3L))

R dataframe each row has unique value

Answers (2)

Related Questions