Przemo
Przemo

Reputation: 13

Combining variables in dataframe

I have such a dataframe:

KEY C1  C2  C3  C4
A   0   0   1   0
B   0   0   1   0
C   0   1   1   0
D   0   0   1   0
E   1   0   1   0
F   1   0   0   0
G   0   1   0   0
H   0   0   1   0
I   0   1   1   0
J   1   0   0   1

and would like to build this kind of matrix with only two values "1" being in the two variables.

I would not like to count rows where there are more then two values like:

KEY C1  C2  C3  C4
L   1   0   1   1

or less then two:

M   1   0   0  0

Output should be frequency table.

   C1 C2 C3 C4
C1 3  0  1  1
C2 0  3  2  0
C3 1  2  7  0
C4 1  0  0  1

There may be more variables up to C20 and of course more rows. Thanks for helping me out!

Upvotes: 1

Views: 84

Answers (2)

akrun
akrun

Reputation: 886998

Try

 m1 <- t(df1[-1])
 colnames(m1) <- df1[,1]
 tcrossprod(m1)
 #   C1 C2 C3 C4
 #C1  3  0  1  1
 #C2  0  3  2  0
 #C3  1  2  7  0
 #C4  1  0  0  1

Regarding the subset part, I am not getting the expected result,

 df1 <- df1[rowSums(df1[-1])==2,]
 m1 <- t(df1[-1])
 colnames(m1) <- df1[,1]
 tcrossprod(m1)
 #   C1 C2 C3 C4
 #C1  2  0  1  1
 #C2  0  2  2  0
 #C3  1  2  3  0
 #C4  1  0  0  1

data

df1 <- structure(list(KEY = c("A", "B", "C", "D", "E", "F", "G", "H", 
"I", "J"), C1 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L), C2 = c(0L, 
0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), C3 = c(1L, 1L, 1L, 1L, 1L, 
0L, 0L, 1L, 1L, 0L), C4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
1L)), .Names = c("KEY", "C1", "C2", "C3", "C4"), class = "data.frame", 
row.names = c(NA, -10L))

Upvotes: 3

Aaron Katch
Aaron Katch

Reputation: 451

looks like you want to subset first. Try this:

df  <-  read.csv("file1.csv")

df2 <-  subset(df, rowSums(df[,-1]) == 2)

m1 <- t(df2[-1])

colnames(m1) <- df1[,1]
tcrossprod(m1)

This gives

#     C1 C2 C3 C4
# C1  2  0  1  1
# C2  0  2  2  0
# C3  1  2  3  0
# C4  1  0  0  1

Upvotes: 2

Related Questions