Reputation: 79
I would like to change a dataframe so that it only contains unique values on each row. For example, suppose I have a dataframe like this:
person1 person2 person3
1 2 NA
4 4 5
6 NA NA
But i want to change it so that on each row there are only unique values:
person1 person2 person3
1 NA NA
NA 2 NA
NA NA NA
4 4 NA
NA NA 5
6 NA NA
The goal in the end is that i want to make an incidence matrix, like this:
person1 person2 person3
1 1 0 0
2 0 1 0
3 0 0 0
4 1 1 0
5 0 0 1
6 1 0 0
Does someone have a suggestion on how to do this with R?
Upvotes: 1
Views: 371
Reputation: 8621
One way could be to allocate yourself a matrix of as many rows as the highest value in the data frame, then use a simple loop to fill it with 1
s in the correct positions.
Let's call the allocated matrix output
, giving it the same colnames as as the original data frame.
max.value <- max(df, na.rm=T)
output <- matrix(0, nrow = max.value, ncol=ncol(df))
colnames(output) <- colnames(df)
Now we have a 6x3 matrix of zeros. Now, a simple nested loop goes through each column of output
, assigning 1s to the appropriate column positions of output
as represented by i
.
for (j in 1:ncol(output)) { #for each column of the output matrix
for (i in df[, j]) { #for the appropriate position in the column according to df
output[i, j] <- 1 #assign 1 to that position
}
}
> output
person1 person2 person3
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 0
[4,] 1 1 0
[5,] 0 0 1
[6,] 1 0 0
Should work with as many people and rows as you need.
Addendum: here's the dput
of the test data frame.
structure(list(person1 = c(1L, 4L, 6L), person2 = c(2L, 4L, NA
), person3 = c(NA, 5L, NA)), .Names = c("person1", "person2",
"person3"), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 1
Reputation: 1427
This doesn't fill in "missing" values (e.g. no one has a 3) but will create a sparse incidence matrix.
library(tidyverse)
data = tribble(
~person1, ~person2, ~person3,
1, 2, NA,
4, 4, 5,
6, NA, NA
)
data %>%
gather(key, value, na.rm = T) %>%
xtabs(~ value + key, data = ., sparse = T)
#> 5 x 3 sparse Matrix of class "dgCMatrix"
#> person1 person2 person3
#> 1 1 . .
#> 2 . 1 .
#> 4 1 1 .
#> 5 . . 1
#> 6 1 . .
If you want to construct all "missing" variables, you would want to convert the "number" element to a factor with all levels.
For instance:
data %>%
gather(key, value, na.rm = T) %>%
# Add factor with levels 1:6 --> 1, 2, 3, 4, 5, 6
mutate(value = factor(value, levels = 1:6)) %>%
xtabs(~ value + key, data = ., sparse = T)
#> 6 x 3 sparse Matrix of class "dgCMatrix"
#> person1 person2 person3
#> 1 1 . .
#> 2 . 1 .
#> 3 . . .
#> 4 1 1 .
#> 5 . . 1
#> 6 1 . .
Upvotes: 0