T Richard
T Richard

Reputation: 601

Create a matrix of 0's and 1's from a data frame using R

How can I create a matrix of 0's and 1's from a data set with three columns labelled as hosp (i.e. hospital), pid (i.e. patient id) and treatment, as shown below

df<-
structure(list(
hosp=c(1L,1L,1L,1L,1L,1L,2L,2L,2L),
pid=c(1L,1L,1L,2L,3L,3L,4L,5L,5L),
treatment=c(0L,0L,0L,1L,1L,1L,0L,1L,1L)
),
.Names=c("hosp","pid","treatment"),
class="data.frame",row.names=c(NA,-9))

The rows and columns of the matrix should be the number of observations (in this case 9) and the unique number of hospitals, respectively. The entries in the matrix should be the treatment values, that is, it is 1 for a given hospital if the corresponding patient received treatment 1 in that hospital and 0 otherwise. The matrix should look like

matrix(c(0,0,
0,0,
0,0,
1,0,
1,0,
1,0,
0,0,
0,1,
0,1),nrow=9,byrow=TRUE)

Any help would be much appreciated, thanks.

Upvotes: 0

Views: 92

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269526

1) Create a model matrix from hosp as a factor with no intercept term and multiply that by treatment:

hosp <- factor(df$hosp)
model.matrix(~ hosp + 0) * df$treatment

giving:

  hosp1 hosp2
1     0     0
2     0     0
3     0     0
4     1     0
5     1     0
6     1     0
7     0     0
8     0     1
9     0     1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$hosp
[1] "contr.treatment"

2) outer(hosp, unique(hosp), "==") is the model matrix of hosp except using TRUE/FALSE in place of 1/0. Multiply that by treatment.

with(df, outer(hosp, unique(hosp), "==") * treatment)

giving

      [,1] [,2]
 [1,]    0    0
 [2,]    0    0
 [3,]    0    0
 [4,]    1    0
 [5,]    1    0
 [6,]    1    0
 [7,]    0    0
 [8,]    0    1
 [9,]    0    1

Update: Added (1) and simplified (2).

Upvotes: 1

Antonios
Antonios

Reputation: 1939

how about:

> sapply(unique(df$hosp),function(x) ifelse(df$hosp==x&df$treatment==1,1,0))
      [,1] [,2]
 [1,]    0    0
 [2,]    0    0
 [3,]    0    0
 [4,]    1    0
 [5,]    1    0
 [6,]    1    0
 [7,]    0    0
 [8,]    0    1
 [9,]    0    1

Upvotes: 0

Matt W.
Matt W.

Reputation: 3722

Here's my workaround for this. Not the cleanest, but it works!

    require(dplyr)

df2 <- df %>% 
  mutate(x = row_number()) %>% 
  select(-pid) %>% 
  spread(x, treatment)

df3 <- df2 %>% 
  gather("keys", "value", 2:10) %>% 
  spread(hosp, value) %>% 
  select(-keys)

df3[is.na(df3)] <- 0
df3 <- as.matrix(df3)

Step by Step:

Take original df and add a row_number to it so we can spread without duplication. We'll also remove pid since you're changing this to a matrix.

    require(dplyr)

df2 <- df %>% 
  mutate(x = row_number()) %>% 
  select(-pid) %>% 
  spread(x, treatment)

Then we want to change it back to long form:

df3 <- df2 %>% 
  gather("keys", "value", 2:10) %>% 
  spread(hosp, value) %>% 
  select(-keys)

Some of the values are still NA, so we convert them into 0s, and then turn it into a matrix using ``

df3[is.na(df3)] <- 0
df3 <- as.matrix(df3)

  1 2
1 0 0
2 0 0
3 0 0
4 1 0
5 1 0
6 1 0
7 0 0
8 0 1
9 0 1

Upvotes: 0

Related Questions