Reputation: 67
I am trying to make a simple clustering manually (without using any clustering algorithm) based on the distance between the points. I used the pearson correlation to calculate the distance:
c <- round(cor(t(df)), digits = 2)
d <- as.dist(1 - c)
I want to cluster all point that have a correlation greater than a certain threshold. For example 0,7. How could I cluster this data points in R?
The first rows and columns of my data frame look like this: (there are in total 188 entries and 31 columns
| | A1 | A2 | A3 | A4 | A5 |
| --- | --- | --- | --- | --- | --- |
| U00 | 0 | 0 | 0 | 0 | 0 |
| U01 | 0 | 0 | 84 | 0 | 0 |
| U02 | 0 | 1 | 0 | 0 | 0 |
| U03 | 0 | 0 | 0 | 0 | 0 |
| U04 | 0 | 0 | 0 | 0 | 0 |
| U05 | 0 | 0 | 0 | 0 | 0 |
| U06 | 0 | 0 | 0 | 0 | 0 |
and the dist:
| | U00 | U01 | U02 | U03 | U04 | U05 | U06 |
| | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
| U01 | 0,05 | | | | | | |
| U02 | 1,04 | 1,05 | | | | | |
| U03 | 1,04 | 1,04 | 0,92 | | | | |
| U04 | 1,04 | 1,04 | 0,92 | 0,00 | | | |
| U05 | 1,04 | 1,04 | 0,92 | 0,00 | 0,00 | | |
| U06 | 1,04 | 1,04 | 0,92 | 0,00 | 0,00 | 0,00 | |
At the end I would like to habe an extra column in my data frame with the number of the cluster. Thank you in advance!
Upvotes: 0
Views: 191
Reputation: 5232
Things like this can be done using igraph
package:
library(igraph)
threshold <- 0.7
graph_from_adjacency_matrix(abs(cor(df)) > threshold) %>%
components() %>%
membership() %>%
split(names(.), .)
note: I took absolute correlation, you can just remove abs
.
Upvotes: 1