Reputation: 157
I have a large data set and need to compare all combinations between columns. The desired output would be a matrix for each column combination.
The data frame to start with could look like Data
:
set.seed(1)
Data <- data.frame(
ID = (1:100),
A = sample(1:10,10),
B = sample(1:20,100,replace = T),
C = sample(1:5,100,replace = T),
D = sample(1:20,100,replace = T)
)
Data
and I want to know how often appears the same combination within two columns.
(e.g. how often is a 1
in A
in the same row with a 4
in B
)
for all combinations between column A to D?
I was using:
require(dplyr)
X1 <- ddply(Data,.(A,B),transmute, count=length(ID))
and get a object like:
A B count
1 1 3 1
2 1 7 1
3 1 9 2
4 1 9 2
5 1 12 1
6 1 13 1
7 1 14 1
8 1 16 1
9 1 18 1
10 1 20 1
11 2 2 1
12 2 6 1
13 2 10 1
14 2 11 1
But how can I get the count
result in a matrix format?
The output for A vs. B cold look like:
B1 B2 B3 B4 B5 B6
A1 1 1 2 1 1 ...
A2 1 1 2 1 1
A3 2 1 1 1 1
A4 2 1 1 1 1
A5 1 1 2 1 2
A6 1 1 2 1 2
A7 1 3 1 1 1
A8 1 3 1 1 2
A9 1 3 2 1 2
A10 1 1 2 1 1
In the best case the result would be a `list` containing the objects `AB` `AC` ...`CD` as matrix.
Upvotes: 0
Views: 439
Reputation: 107687
Consider base R using aggregate
and reshape
:
agg <- aggregate(cbind(count=ID) ~ B + A, Data, FUN=length)
rdf <- reshape(agg, timevar = "B", idvar = "A",
drop = c("ID", "C", "D"),
direction = "wide")
# CLEAN-UP
rdf <- with(rdf, rdf[order(A), c("A", paste0("count.", 1:20))]) # RE-ORDER ROWS AND COLS
rownames(rdf) <- NULL # RESET ROW NAMES
colnames(rdf) <- gsub("count.", "B", names(rdf)) # RENAME COL NAMES
rdf[is.na(rdf)] <- 0 # CONVERT NAs TO O
rdf
# A B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 B16 B17 B18 B19 B20
# 1 1 1 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 2 1
# 2 2 0 0 0 0 0 0 2 0 1 3 0 0 2 0 1 0 0 0 1 0
# 3 3 1 1 0 0 0 1 0 0 1 0 0 0 0 2 1 1 1 0 0 1
# 4 4 1 0 0 0 1 2 0 0 0 2 1 1 0 1 0 1 0 0 0 0
# 5 5 0 0 0 0 1 2 1 3 0 0 0 1 0 0 0 0 0 0 1 1
# 6 6 1 0 2 0 1 2 1 0 0 0 0 0 1 0 0 0 0 0 0 2
# 7 7 1 0 0 0 1 1 1 0 0 2 1 0 0 0 0 2 1 0 0 0
# 8 8 0 0 2 0 0 0 1 2 1 0 2 0 0 0 0 0 0 1 1 0
# 9 9 1 0 0 0 0 1 1 0 0 0 0 1 1 2 1 0 0 1 1 0
# 10 10 1 1 0 0 0 0 0 0 1 1 0 2 0 1 1 0 1 0 1 0
Upvotes: 0
Reputation: 4284
You can do this :
library(tidyverse)
X2 <-X1 %>% group_by(A,B) %>%
summarise(count=max(count)) %>% #use max instead of sum
ungroup() %>%
mutate(A=paste0("A",A),B=paste0("B",B)) %>%
spread(B,count,fill=0)
X3 <- as.matrix(X2[,2:ncol(X2)])
rownames(X3) <- as.character(X2$A)
Upvotes: 1