ncnc_2020
ncnc_2020

Reputation: 67

Make a simple clustering manually in R

I am trying to make a simple clustering manually (without using any clustering algorithm) based on the distance between the points. I used the pearson correlation to calculate the distance:

c <- round(cor(t(df)), digits = 2)
d <- as.dist(1 - c)

I want to cluster all point that have a correlation greater than a certain threshold. For example 0,7. How could I cluster this data points in R?

The first rows and columns of my data frame look like this: (there are in total 188 entries and 31 columns

|       |  A1 |  A2 |  A3 |  A4 |  A5 |
|  ---  | --- | --- | --- | --- | --- |
|  U00  |  0  |  0  |  0  |  0  |  0  |
|  U01  |  0  |  0  | 84  |  0  |  0  |
|  U02  |  0  |  1  |  0  |  0  |  0  |
|  U03  |  0  |  0  |  0  |  0  |  0  |
|  U04  |  0  |  0  |  0  |  0  |  0  |
|  U05  |  0  |  0  |  0  |  0  |  0  |
|  U06  |  0  |  0  |  0  |  0  |  0  |

and the dist:

|       |   U00  |   U01  |   U02  |   U03  |   U04  |   U05  |   U06  |
|       | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
|  U01  |  0,05  |        |        |        |        |        |        |
|  U02  |  1,04  |  1,05  |        |        |        |        |        |
|  U03  |  1,04  |  1,04  |  0,92  |        |        |        |        |
|  U04  |  1,04  |  1,04  |  0,92  |  0,00  |        |        |        |
|  U05  |  1,04  |  1,04  |  0,92  |  0,00  |  0,00  |        |        |
|  U06  |  1,04  |  1,04  |  0,92  |  0,00  |  0,00  |  0,00  |        |

At the end I would like to habe an extra column in my data frame with the number of the cluster. Thank you in advance!

Upvotes: 0

Views: 191

Answers (1)

det
det

Reputation: 5232

Things like this can be done using igraph package:

library(igraph)

threshold <- 0.7

graph_from_adjacency_matrix(abs(cor(df)) > threshold) %>% 
  components() %>%
  membership() %>%
  split(names(.), .)

note: I took absolute correlation, you can just remove abs.

Upvotes: 1

Related Questions