Reputation: 13
I have such a dataframe:
KEY C1 C2 C3 C4
A 0 0 1 0
B 0 0 1 0
C 0 1 1 0
D 0 0 1 0
E 1 0 1 0
F 1 0 0 0
G 0 1 0 0
H 0 0 1 0
I 0 1 1 0
J 1 0 0 1
and would like to build this kind of matrix with only two values "1" being in the two variables.
I would not like to count rows where there are more then two values like:
KEY C1 C2 C3 C4
L 1 0 1 1
or less then two:
M 1 0 0 0
Output should be frequency table.
C1 C2 C3 C4
C1 3 0 1 1
C2 0 3 2 0
C3 1 2 7 0
C4 1 0 0 1
There may be more variables up to C20 and of course more rows. Thanks for helping me out!
Upvotes: 1
Views: 84
Reputation: 886998
Try
m1 <- t(df1[-1])
colnames(m1) <- df1[,1]
tcrossprod(m1)
# C1 C2 C3 C4
#C1 3 0 1 1
#C2 0 3 2 0
#C3 1 2 7 0
#C4 1 0 0 1
Regarding the subset
part, I am not getting the expected result,
df1 <- df1[rowSums(df1[-1])==2,]
m1 <- t(df1[-1])
colnames(m1) <- df1[,1]
tcrossprod(m1)
# C1 C2 C3 C4
#C1 2 0 1 1
#C2 0 2 2 0
#C3 1 2 3 0
#C4 1 0 0 1
df1 <- structure(list(KEY = c("A", "B", "C", "D", "E", "F", "G", "H",
"I", "J"), C1 = c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L), C2 = c(0L,
0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L), C3 = c(1L, 1L, 1L, 1L, 1L,
0L, 0L, 1L, 1L, 0L), C4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L)), .Names = c("KEY", "C1", "C2", "C3", "C4"), class = "data.frame",
row.names = c(NA, -10L))
Upvotes: 3
Reputation: 451
looks like you want to subset first. Try this:
df <- read.csv("file1.csv")
df2 <- subset(df, rowSums(df[,-1]) == 2)
m1 <- t(df2[-1])
colnames(m1) <- df1[,1]
tcrossprod(m1)
This gives
# C1 C2 C3 C4
# C1 2 0 1 1
# C2 0 2 2 0
# C3 1 2 3 0
# C4 1 0 0 1
Upvotes: 2