Reputation: 23
I have a matrix that looks like the following:
col_1 col_2 value
A B 2.1
A C 1.3
B C 4.6
A D 1.4
....
I would like to get a similarity matrix:
A B C D
A X 2.1 1.3 1.4
B 2.1 X 4.6 ...
C ... ... X ...
D ... ... ... X
So the row and the column names are A,B,C,D and the it takes the value from the third column and adds it to the matrix The problem is also that the original matrix has a length of approximately 10 000 rows.
Upvotes: 0
Views: 2319
Reputation: 18701
With xtabs
and mutate_at
. sparse = TRUE
turns the output to a sparseMatrix:
library(dplyr)
mat <- df %>%
mutate_at(1:2, factor, levels = unique(c(levels(.$col_1), levels(.$col_2)))) %>%
xtabs(value ~ col_1 + col_2, data=., sparse = TRUE)
mat[lower.tri(mat)] <- mat[upper.tri(mat)]
Result:
4 x 4 sparse Matrix of class "dgCMatrix"
col_2
col_1 A B C D
A . 2.1 1.3 1.4
B 2.1 . 4.6 .
C 1.3 1.4 . .
D 4.6 . . .
Upvotes: 2
Reputation: 33753
As Roland suggests, you could use dcast()
:
library(data.table)
dcast(df, col_1 ~ col_2)
## col_1 B C D
## 1 A 2.1 1.3 1.4
## 2 B NA 4.6 NA
Where:
df <- data.frame(
col_1 = c("A", "A", "B", "A"),
col_2 = c("B","C", "C", "D"),
value = c(2.1, 1.3, 4.6, 1.4)
)
Upvotes: 2
Reputation: 1514
You can do it the following way. I write the code in Python since no language is specified
#I assume that your data is in a python pandas dataframe called df
df = ..load your data
list_of_labels = [ 'A','B','C','D' ]
nb_labels = len(list_of_labels)
similarity = np.zeros( (nb_labels,nb_labels) )
for l1, l2, val in zip( df['col_1'] , df['col_2'] , df['value'] ):
i = list_of_labels.index( l1 )
j = list_of_labels.index( l2 )
similarity[i][j] = val
similarity_df = pd.DataFrame(data=similarity, index=list_of_labels, columns=list_of_labels)
Upvotes: 1