Reputation: 75
I've got data on survival/sampling dates of over 500 dogs, each dog having been sampled at least once, and several having been sampled three or four times. For e.g.
Microchip_number Date Sampling_occasion
White notched fatso 20,11,2018 First
White notched fatso 28,12,2018 Second
White notched fatso 09,04,2019 Third
White notched fatso 23,10,2019 Fourth
Tuttu Jeevan 06,12,2018 First
Tuttu Jeevan 03,01,2019 Second
Tuttu Jeevan 04,05,2019 Third
Tuppy 22,10,2018 First
Tuppy 20,11,2018 Second
Tuppy 17,04,2019 Third
Tuppy 31,07,2019 Lost to study
I've managed to plot this in ggplot, but it's a very large image which requires zooming in and scrolling to view the sampling times of each individual dog.
I've found suggestions to split large dataframes based on a certain variable (e.g. month) or to use facet_wrap, but in my case, I don't have any such variable to use. Is there a way to split this large plot into multiple smaller plots that don't need to be zoomed in to view all the details clearly, such as below (without having to separately plot subsets of the dataframe)?
How I'd like each split/sub-plot to appear
This is the code I'm using
outcomes <- read_xlsx("Dog outcomes.xlsx", col_types = c("text", "date", "text"))
outcomes$Microchip_number<- as.factor(outcomes$Microchip_number)
outcomes$Sampling_occasion<- factor(outcomes$Sampling_occasion,
levels = c("First", "Second", "Third", "Fourth", "Lost to study", "Died"))
g<- ggplot(outcomes)
g + geom_point(aes(x = Date, y = Microchip_number, colour = Sampling_occasion, shape = Sampling_occasion)) +
geom_line(aes(x = Date, y = Microchip_number, group = Microchip_number, colour = Sampling_occasion)) +
theme_bw()
Upvotes: 1
Views: 1602
Reputation: 75
Thanks so much, Jrm FRL, the code to add the counter
and subgroup
columns was exactly what I needed! As Gregor mentioned, facet_wrap
just made things more difficult to view, so I used a for loop using subgroup
to plot 50 dogs per pdf page (or any other device). This is the code I used, and it's worked perfectly, although for some reason, the 'Microchip_number
's are displaying in reverse sequence / alphabetical order (68481, 68480, 68479 etc.), despite being organised the other way round in the main dataframe 'outcomes'. Minor quibble, however! This makes it so much easier to visualise outcomes for specific individuals. Cheers!
outcomes2 <- outcomes %>%
mutate(counter = 1 + cumsum(c(0,as.numeric(diff(Microchip_number))!=0)), # this counter starting at 1 increments for each new dog
subgroup = as.factor(ceiling(counter/50)))
pdf(file = "All_outcomes_50.pdf") #
for (i in 1:length(unique(outcomes2$subgroup))) {
outcomes2 %>%
filter(subgroup == i) -> df
ggplot(df) + geom_point(aes(x = Date, y = Microchip_number, colour = Sampling_occasion, shape = Sampling_occasion)) +
geom_line(aes(x = Date, y = Microchip_number, group = Microchip_number, colour = Sampling_occasion)) +
theme_bw() -> wow
print(wow)
}
dev.off()
New plot after using 'for' loop
Upvotes: 2
Reputation: 1413
You can simply divide your dasatet in sub-groups containing the same number of dogs (e.g. 10).
Add an intermediate counter
column to overcome the small difficulty that there is not necessarly the same number of rows for each dog.
I would suggest :
library('dplyr')
outcomes <- outcomes %>%
mutate(counter = 1 + cumsum(c(0,as.numeric(diff(Microchip_number))!=0)), # this counter starting at 1 increments for each new dog
subgroup = as.factor(ceiling(counter/10)))
You will obtain a new dataset with a factor subgroup
column whose value is different every 10th dog. Then just add a + facet_wrap(.~subgroup)
to your plot.
Hope this will help.
Upvotes: 1