Moore
Moore

Reputation: 129

plotting multi time series by clustering results groups in r

I have used hclust() and cutree() to cluster time series of several sites. I am looking for a way to merge the resulted clusters with the original data and plot each cluster time series. For example, if I have 3 clusters with 1, 2, and 3 sites respectively, I need to plot 3 plots one for each cluster sites time series. Thanks Rs

Example from the data

> data
     Site1 Site2 Site3 Site4 Site5 Site6
1985    11     0     5    15    13    15
1986    12    12     5    31    14    26
1987    23    21    17    14    25    12
1988    22    25    18    17    24    14
1989    11    16     8    18    13    19
1990     7     5    21     8     9    24
1991    20    13     9    21    22     7
1992    15    11     6    19    17    20
1993    19    18     9    11    21    11
1994    33     9    28    17    26    20
1995    16    14    19    33    17    10
1996    14    21    25     4     6    47
1997     4     0    11    22    14    16
1998    10    31    13    26    12    14
1999    24    17    18    41    19    20
2000    21    17    23    19    23    14
2001    12     8     6     7    19    20
2002    19    24    19    31    24    17
2003    13    29    10    28     7     9
2004    19    14    19    22    20    13
2005    16     8     9    10    11    13
2006     8     9    46     9    20    19
2007    12    10    15    13    10     9
2008    12    18    25    12    47    22
2009    19    18    18    23    21    20
2010    23    10    46    35    25    12
2011    20    35    18    30    22    18
2012    23    13    23    34    25    34
2013    17    28    20    13    19    21
2014    19    22    16    16    21    23

Upvotes: 0

Views: 429

Answers (1)

Adam Quek
Adam Quek

Reputation: 7153

This will insert the cluster grouping into the data.frame

df2 <- data.frame(t(df))
tree <- hclust(dist(df2))
df2$gp <- cutree(tree, k=3)

After that you would need to manipulate your data, either by split or by ggplot. Didn't understand how you manage to get the three 3 clusters with 1, 2, and 3 sites though. Perhaps it would be easier if you have shared your code instead of just describing it.

EDIT:

Here's a way to put the grouping back in using dplyr package:

df2 <- df2 %>% 
       mutate(site=rownames(.)) %>% 
       select(site, gp)

df3 <- df %>% 
       mutate(year=rownames(.)) %>% 
       gather(site, val, -year) %>% 
       left_join(df2)

With the long form of df3, you can then plot the three cluster with ggplot:

ggplot(df3, aes(as.numeric(year), val, colour=site)) + 
     geom_line() + 
     facet_wrap(~gp)

enter image description here

EDIT:

To put in separate legends for each plot:

df4 <- split(df3, df3$gp)
gplots <- lapply(df4, function(x) ggplot(x, aes(as.numeric(year), val, colour=site)) + 
               geom_line() + 
               theme_bw() +
               theme(legend.position="bottom"))

library(gridExtra)
do.call(grid.arrange, c(gplots, nrow=1))

enter image description here

EDIT #2:

To put in site labels within the plot instead of in the legend, first remove the legend and manually set a longer x-axis label:

 p <- ggplot(df3, aes(as.numeric(year), val, colour=site)) + 
   geom_line() + 
   guides(colour=FALSE) + 
   xlim(1985, 2018) +
   facet_wrap(~gp) + 
   theme_bw()

p

enter image description here

Next, add get the x, y position of the text labels:

text.frame <- df3 %>% 
              filter(year == max(.[,"year"])) %>%
              select(year, val, site, gp)

p +   geom_text(data=text.frame, aes(label=site), 
              nudge_x=3, colour="black", size=4)

enter image description here

Input data:

df <- structure(list(Site1 = c(11L, 12L, 23L, 22L, 11L, 7L, 20L, 15L, 
19L, 33L, 16L, 14L, 4L, 10L, 24L, 21L, 12L, 19L, 13L, 19L, 16L, 
8L, 12L, 12L, 19L, 23L, 20L, 23L, 17L, 19L), Site2 = c(0L, 12L, 
21L, 25L, 16L, 5L, 13L, 11L, 18L, 9L, 14L, 21L, 0L, 31L, 17L, 
17L, 8L, 24L, 29L, 14L, 8L, 9L, 10L, 18L, 18L, 10L, 35L, 13L, 
28L, 22L), Site3 = c(5L, 5L, 17L, 18L, 8L, 21L, 9L, 6L, 9L, 28L, 
19L, 25L, 11L, 13L, 18L, 23L, 6L, 19L, 10L, 19L, 9L, 46L, 15L, 
25L, 18L, 46L, 18L, 23L, 20L, 16L), Site4 = c(15L, 31L, 14L, 
17L, 18L, 8L, 21L, 19L, 11L, 17L, 33L, 4L, 22L, 26L, 41L, 19L, 
7L, 31L, 28L, 22L, 10L, 9L, 13L, 12L, 23L, 35L, 30L, 34L, 13L, 
16L), Site5 = c(13L, 14L, 25L, 24L, 13L, 9L, 22L, 17L, 21L, 26L, 
17L, 6L, 14L, 12L, 19L, 23L, 19L, 24L, 7L, 20L, 11L, 20L, 10L, 
47L, 21L, 25L, 22L, 25L, 19L, 21L), Site6 = c(15L, 26L, 12L, 
14L, 19L, 24L, 7L, 20L, 11L, 20L, 10L, 47L, 16L, 14L, 20L, 14L, 
20L, 17L, 9L, 13L, 13L, 19L, 9L, 22L, 20L, 12L, 18L, 34L, 21L, 
23L)), .Names = c("Site1", "Site2", "Site3", "Site4", "Site5", 
"Site6"), class = "data.frame", row.names = c("1985", "1986", 
"1987", "1988", "1989", "1990", "1991", "1992", "1993", "1994", 
"1995", "1996", "1997", "1998", "1999", "2000", "2001", "2002", 
"2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", 
"2011", "2012", "2013", "2014"))

Upvotes: 3

Related Questions