Reputation: 129
I have used hclust()
and cutree()
to cluster time series of several sites. I am looking for a way to merge the resulted clusters with the original data and plot each cluster time series. For example, if I have 3 clusters with 1, 2, and 3 sites respectively, I need to plot 3 plots one for each cluster sites time series. Thanks Rs
Example from the data
> data
Site1 Site2 Site3 Site4 Site5 Site6
1985 11 0 5 15 13 15
1986 12 12 5 31 14 26
1987 23 21 17 14 25 12
1988 22 25 18 17 24 14
1989 11 16 8 18 13 19
1990 7 5 21 8 9 24
1991 20 13 9 21 22 7
1992 15 11 6 19 17 20
1993 19 18 9 11 21 11
1994 33 9 28 17 26 20
1995 16 14 19 33 17 10
1996 14 21 25 4 6 47
1997 4 0 11 22 14 16
1998 10 31 13 26 12 14
1999 24 17 18 41 19 20
2000 21 17 23 19 23 14
2001 12 8 6 7 19 20
2002 19 24 19 31 24 17
2003 13 29 10 28 7 9
2004 19 14 19 22 20 13
2005 16 8 9 10 11 13
2006 8 9 46 9 20 19
2007 12 10 15 13 10 9
2008 12 18 25 12 47 22
2009 19 18 18 23 21 20
2010 23 10 46 35 25 12
2011 20 35 18 30 22 18
2012 23 13 23 34 25 34
2013 17 28 20 13 19 21
2014 19 22 16 16 21 23
Upvotes: 0
Views: 429
Reputation: 7153
This will insert the cluster grouping into the data.frame
df2 <- data.frame(t(df))
tree <- hclust(dist(df2))
df2$gp <- cutree(tree, k=3)
After that you would need to manipulate your data, either by split or by ggplot. Didn't understand how you manage to get the three 3 clusters with 1, 2, and 3 sites though. Perhaps it would be easier if you have shared your code instead of just describing it.
EDIT:
Here's a way to put the grouping back in using dplyr package:
df2 <- df2 %>%
mutate(site=rownames(.)) %>%
select(site, gp)
df3 <- df %>%
mutate(year=rownames(.)) %>%
gather(site, val, -year) %>%
left_join(df2)
With the long form of df3, you can then plot the three cluster with ggplot:
ggplot(df3, aes(as.numeric(year), val, colour=site)) +
geom_line() +
facet_wrap(~gp)
EDIT:
To put in separate legends for each plot:
df4 <- split(df3, df3$gp)
gplots <- lapply(df4, function(x) ggplot(x, aes(as.numeric(year), val, colour=site)) +
geom_line() +
theme_bw() +
theme(legend.position="bottom"))
library(gridExtra)
do.call(grid.arrange, c(gplots, nrow=1))
EDIT #2:
To put in site labels within the plot instead of in the legend, first remove the legend and manually set a longer x-axis label:
p <- ggplot(df3, aes(as.numeric(year), val, colour=site)) +
geom_line() +
guides(colour=FALSE) +
xlim(1985, 2018) +
facet_wrap(~gp) +
theme_bw()
p
Next, add get the x, y position of the text labels:
text.frame <- df3 %>%
filter(year == max(.[,"year"])) %>%
select(year, val, site, gp)
p + geom_text(data=text.frame, aes(label=site),
nudge_x=3, colour="black", size=4)
Input data:
df <- structure(list(Site1 = c(11L, 12L, 23L, 22L, 11L, 7L, 20L, 15L,
19L, 33L, 16L, 14L, 4L, 10L, 24L, 21L, 12L, 19L, 13L, 19L, 16L,
8L, 12L, 12L, 19L, 23L, 20L, 23L, 17L, 19L), Site2 = c(0L, 12L,
21L, 25L, 16L, 5L, 13L, 11L, 18L, 9L, 14L, 21L, 0L, 31L, 17L,
17L, 8L, 24L, 29L, 14L, 8L, 9L, 10L, 18L, 18L, 10L, 35L, 13L,
28L, 22L), Site3 = c(5L, 5L, 17L, 18L, 8L, 21L, 9L, 6L, 9L, 28L,
19L, 25L, 11L, 13L, 18L, 23L, 6L, 19L, 10L, 19L, 9L, 46L, 15L,
25L, 18L, 46L, 18L, 23L, 20L, 16L), Site4 = c(15L, 31L, 14L,
17L, 18L, 8L, 21L, 19L, 11L, 17L, 33L, 4L, 22L, 26L, 41L, 19L,
7L, 31L, 28L, 22L, 10L, 9L, 13L, 12L, 23L, 35L, 30L, 34L, 13L,
16L), Site5 = c(13L, 14L, 25L, 24L, 13L, 9L, 22L, 17L, 21L, 26L,
17L, 6L, 14L, 12L, 19L, 23L, 19L, 24L, 7L, 20L, 11L, 20L, 10L,
47L, 21L, 25L, 22L, 25L, 19L, 21L), Site6 = c(15L, 26L, 12L,
14L, 19L, 24L, 7L, 20L, 11L, 20L, 10L, 47L, 16L, 14L, 20L, 14L,
20L, 17L, 9L, 13L, 13L, 19L, 9L, 22L, 20L, 12L, 18L, 34L, 21L,
23L)), .Names = c("Site1", "Site2", "Site3", "Site4", "Site5",
"Site6"), class = "data.frame", row.names = c("1985", "1986",
"1987", "1988", "1989", "1990", "1991", "1992", "1993", "1994",
"1995", "1996", "1997", "1998", "1999", "2000", "2001", "2002",
"2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010",
"2011", "2012", "2013", "2014"))
Upvotes: 3