Reputation: 601
How can I create a matrix of 0's and 1's from a data set with three columns labelled as hosp (i.e. hospital), pid (i.e. patient id) and treatment, as shown below
df<-
structure(list(
hosp=c(1L,1L,1L,1L,1L,1L,2L,2L,2L),
pid=c(1L,1L,1L,2L,3L,3L,4L,5L,5L),
treatment=c(0L,0L,0L,1L,1L,1L,0L,1L,1L)
),
.Names=c("hosp","pid","treatment"),
class="data.frame",row.names=c(NA,-9))
The rows and columns of the matrix should be the number of observations (in this case 9) and the unique number of hospitals, respectively. The entries in the matrix should be the treatment values, that is, it is 1 for a given hospital if the corresponding patient received treatment 1 in that hospital and 0 otherwise. The matrix should look like
matrix(c(0,0,
0,0,
0,0,
1,0,
1,0,
1,0,
0,0,
0,1,
0,1),nrow=9,byrow=TRUE)
Any help would be much appreciated, thanks.
Upvotes: 0
Views: 92
Reputation: 269526
1) Create a model matrix from hosp
as a factor with no intercept term and multiply that by treatment
:
hosp <- factor(df$hosp)
model.matrix(~ hosp + 0) * df$treatment
giving:
hosp1 hosp2
1 0 0
2 0 0
3 0 0
4 1 0
5 1 0
6 1 0
7 0 0
8 0 1
9 0 1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$hosp
[1] "contr.treatment"
2) outer(hosp, unique(hosp), "==")
is the model matrix of hosp
except using TRUE/FALSE in place of 1/0. Multiply that by treatment
.
with(df, outer(hosp, unique(hosp), "==") * treatment)
giving
[,1] [,2]
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 0
[5,] 1 0
[6,] 1 0
[7,] 0 0
[8,] 0 1
[9,] 0 1
Update: Added (1) and simplified (2).
Upvotes: 1
Reputation: 1939
how about:
> sapply(unique(df$hosp),function(x) ifelse(df$hosp==x&df$treatment==1,1,0))
[,1] [,2]
[1,] 0 0
[2,] 0 0
[3,] 0 0
[4,] 1 0
[5,] 1 0
[6,] 1 0
[7,] 0 0
[8,] 0 1
[9,] 0 1
Upvotes: 0
Reputation: 3722
Here's my workaround for this. Not the cleanest, but it works!
require(dplyr)
df2 <- df %>%
mutate(x = row_number()) %>%
select(-pid) %>%
spread(x, treatment)
df3 <- df2 %>%
gather("keys", "value", 2:10) %>%
spread(hosp, value) %>%
select(-keys)
df3[is.na(df3)] <- 0
df3 <- as.matrix(df3)
Step by Step:
Take original df
and add a row_number to it so we can spread
without duplication. We'll also remove pid
since you're changing this to a matrix.
require(dplyr)
df2 <- df %>%
mutate(x = row_number()) %>%
select(-pid) %>%
spread(x, treatment)
Then we want to change it back to long form:
df3 <- df2 %>%
gather("keys", "value", 2:10) %>%
spread(hosp, value) %>%
select(-keys)
Some of the values are still NA
, so we convert them into 0
s, and then turn it into a matrix using ``
df3[is.na(df3)] <- 0
df3 <- as.matrix(df3)
1 2
1 0 0
2 0 0
3 0 0
4 1 0
5 1 0
6 1 0
7 0 0
8 0 1
9 0 1
Upvotes: 0