SR1614
SR1614

Reputation: 75

Split one massive plot into smaller sub-plots for better visualisation in ggplot

I've got data on survival/sampling dates of over 500 dogs, each dog having been sampled at least once, and several having been sampled three or four times. For e.g.

Microchip_number    Date       Sampling_occasion

White notched fatso 20,11,2018 First
White notched fatso 28,12,2018 Second
White notched fatso 09,04,2019 Third
White notched fatso 23,10,2019 Fourth
Tuttu Jeevan        06,12,2018 First
Tuttu Jeevan        03,01,2019 Second
Tuttu Jeevan        04,05,2019 Third
Tuppy               22,10,2018 First
Tuppy               20,11,2018 Second
Tuppy               17,04,2019 Third
Tuppy               31,07,2019 Lost to study

I've managed to plot this in ggplot, but it's a very large image which requires zooming in and scrolling to view the sampling times of each individual dog.

Plot of outcomes for all dogs

I've found suggestions to split large dataframes based on a certain variable (e.g. month) or to use facet_wrap, but in my case, I don't have any such variable to use. Is there a way to split this large plot into multiple smaller plots that don't need to be zoomed in to view all the details clearly, such as below (without having to separately plot subsets of the dataframe)?

How I'd like each split/sub-plot to appear

This is the code I'm using

outcomes <- read_xlsx("Dog outcomes.xlsx", col_types = c("text", "date", "text"))

outcomes$Microchip_number<- as.factor(outcomes$Microchip_number)

outcomes$Sampling_occasion<- factor(outcomes$Sampling_occasion,
                             levels = c("First", "Second", "Third", "Fourth", "Lost to study", "Died"))

g<- ggplot(outcomes)

g + geom_point(aes(x = Date, y = Microchip_number, colour = Sampling_occasion, shape = Sampling_occasion)) +
geom_line(aes(x = Date, y = Microchip_number, group = Microchip_number, colour = Sampling_occasion)) +
theme_bw()

Upvotes: 1

Views: 1602

Answers (2)

SR1614
SR1614

Reputation: 75

Thanks so much, Jrm FRL, the code to add the counter and subgroup columns was exactly what I needed! As Gregor mentioned, facet_wrap just made things more difficult to view, so I used a for loop using subgroup to plot 50 dogs per pdf page (or any other device). This is the code I used, and it's worked perfectly, although for some reason, the 'Microchip_number's are displaying in reverse sequence / alphabetical order (68481, 68480, 68479 etc.), despite being organised the other way round in the main dataframe 'outcomes'. Minor quibble, however! This makes it so much easier to visualise outcomes for specific individuals. Cheers!

outcomes2 <- outcomes %>% 
  mutate(counter = 1 + cumsum(c(0,as.numeric(diff(Microchip_number))!=0)), # this counter starting at 1 increments for each new dog
         subgroup = as.factor(ceiling(counter/50)))

pdf(file = "All_outcomes_50.pdf") # 
for (i in 1:length(unique(outcomes2$subgroup))) {
  outcomes2 %>%
    filter(subgroup == i) -> df

  ggplot(df) + geom_point(aes(x = Date, y = Microchip_number, colour = Sampling_occasion, shape = Sampling_occasion)) +
    geom_line(aes(x = Date, y = Microchip_number, group = Microchip_number, colour = Sampling_occasion)) + 
    theme_bw() -> wow
  print(wow)
}
dev.off()

New plot after using 'for' loop

Upvotes: 2

Jrm_FRL
Jrm_FRL

Reputation: 1413

You can simply divide your dasatet in sub-groups containing the same number of dogs (e.g. 10). Add an intermediate counter column to overcome the small difficulty that there is not necessarly the same number of rows for each dog.

I would suggest :

library('dplyr')
outcomes <- outcomes %>% 
  mutate(counter = 1 + cumsum(c(0,as.numeric(diff(Microchip_number))!=0)), # this counter starting at 1 increments for each new dog
         subgroup = as.factor(ceiling(counter/10)))

You will obtain a new dataset with a factor subgroup column whose value is different every 10th dog. Then just add a + facet_wrap(.~subgroup) to your plot.

Hope this will help.

Upvotes: 1

Related Questions