trika
trika

Reputation: 79

R dataframe each row has unique value

I would like to change a dataframe so that it only contains unique values on each row. For example, suppose I have a dataframe like this:

person1 person2 person3
1          2       NA
4          4       5 
6          NA      NA

But i want to change it so that on each row there are only unique values:

person1   person2   person3
1          NA       NA
NA         2        NA
NA         NA       NA
4          4        NA
NA         NA       5
6          NA       NA

The goal in the end is that i want to make an incidence matrix, like this:

    person1   person2   person3
1      1         0         0
2      0         1         0
3      0         0         0
4      1         1         0
5      0         0         1
6      1         0         0

Does someone have a suggestion on how to do this with R?

Upvotes: 1

Views: 371

Answers (2)

Joe
Joe

Reputation: 8621

One way could be to allocate yourself a matrix of as many rows as the highest value in the data frame, then use a simple loop to fill it with 1s in the correct positions.

Let's call the allocated matrix output, giving it the same colnames as as the original data frame.

max.value <- max(df, na.rm=T)
output <- matrix(0, nrow = max.value, ncol=ncol(df))
colnames(output) <- colnames(df)

Now we have a 6x3 matrix of zeros. Now, a simple nested loop goes through each column of output, assigning 1s to the appropriate column positions of output as represented by i.

for (j in 1:ncol(output)) {  #for each column of the output matrix
  for (i in df[, j]) {       #for the appropriate position in the column according to df
    output[i, j] <- 1        #assign 1 to that position
  }
}

> output
     person1 person2 person3
[1,]       1       0       0
[2,]       0       1       0
[3,]       0       0       0
[4,]       1       1       0
[5,]       0       0       1
[6,]       1       0       0

Should work with as many people and rows as you need.

Addendum: here's the dput of the test data frame.

structure(list(person1 = c(1L, 4L, 6L), person2 = c(2L, 4L, NA
), person3 = c(NA, 5L, NA)), .Names = c("person1", "person2", 
"person3"), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 1

Michael Griffiths
Michael Griffiths

Reputation: 1427

This doesn't fill in "missing" values (e.g. no one has a 3) but will create a sparse incidence matrix.

library(tidyverse)

data = tribble(
  ~person1, ~person2, ~person3,
   1,        2,        NA,
   4,        4,        5,
   6,        NA,       NA
  )

data %>% 
  gather(key, value, na.rm = T) %>% 
  xtabs(~ value + key, data = ., sparse = T)

#> 5 x 3 sparse Matrix of class "dgCMatrix"
#>   person1 person2 person3
#> 1       1       .       .
#> 2       .       1       .
#> 4       1       1       .
#> 5       .       .       1
#> 6       1       .       .

If you want to construct all "missing" variables, you would want to convert the "number" element to a factor with all levels.

For instance:

data %>% 
  gather(key, value, na.rm = T) %>% 
  # Add factor with levels 1:6 --> 1, 2, 3, 4, 5, 6
  mutate(value = factor(value, levels = 1:6)) %>% 
  xtabs(~ value + key, data = ., sparse = T)

#> 6 x 3 sparse Matrix of class "dgCMatrix"
#>   person1 person2 person3
#> 1       1       .       .
#> 2       .       1       .
#> 3       .       .       .
#> 4       1       1       .
#> 5       .       .       1
#> 6       1       .       .

Upvotes: 0

Related Questions