Reputation: 11
I wish to create a gantt type chart that will display the following data, grouped and counted. I have absolutely no idea how to approach this problem. My data is like this - over 600 rows.
12,3,4,5,6,7,"Type" 0,0,0,0,5,6,7,"C" 0,0,0,0,0,6,7,"C" 0,0,0,0,5,6,7,"C" 0,0,0,0,0,6,7,"C" 0,0,0,0,0,6,7,"C" 0,0,0,0,5,6,7,"C" 0,0,0,0,0,6,7,"C" 0,0,0,0,0,6,7,"C" 0,0,0,0,0,6,7,"C" 0,0,0,4,5,6,7,"Ch" 1,2,3,0,0,0,0,"Ch" 0,0,0,0,5,0,0,"Ch" 0,0,0,0,5,0,0,"Ch" 0,0,0,0,5,0,0,"Ch" 0,0,0,0,5,0,0,"Ch" 0,0,0,0,0,6,7,"Ch" 1,2,0,0,0,0,0,"Ch" 0,0,0,0,0,6,7,"Ch" 0,0,0,4,5,0,0,"Ch" 0,0,0,0,0,6,7,"Ch" 0,0,0,0,0,6,7,"Ch" 0,0,0,0,0,6,7,"Ch" 0,0,0,0,0,6,7,"D" 0,0,0,0,0,6,7,"D" 0,0,0,4,5,6,7,"D" 1,2,3,0,0,0,0,"D" 0,0,0,0,5,0,0,"D" 0,0,0,0,5,0,0,"D" 0,0,0,0,5,6,7,"D" 0,0,3,4,5,0,0,"D" 0,0,0,0,5,0,0,"D" 0,0,0,0,5,6,0,"P" 0,2,3,4,5,6,7,"P" 0,0,3,4,5,0,0,"P" 0,2,3,0,0,0,0,"P" 0,0,0,0,5,6,7,"P" 1,2,3,4,5,0,0,"P" 0,0,0,0,5,6,7,"P" 0,0,0,4,5,6,7,"P" 0,0,3,4,5,6,7,"P"
The first row is the column headers. The numbered columns (1,2,3,4,5,6,7) signify contiguous time periods, of different length; so 1 might be 8000 to 6000 BP, 2 might be 6000 to 5000 BP, etc. A number in that column signifies only presence (they might just as easily just be 1), a zero represents absence. The eighth column is Type, and there are six different types altogether. What I wish to show is a bar, with a count, on separate rows, for each group of time periods, keeping the 'types' separate. So for example, there might be 2 type 'P' that occur in periods 5,6,7; 3 type 'P' that occur in periods 1,2,3; 5 type 'D' that occur in periods 5,6,7; and so on. Ideally I would like the time periods to have different shades, and the types to have different colours. (It would be even neater to have the time periods reflect their proportionate actual length). I am tried various ideas but my knowledge of R is very limited at the moment. I would be very grateful for any ideas that somebody could come up with.
I haven't actually tried any coding yet, simply because I have been trying to work out the best way to show this data, but also at the same time have been putting together other charts for different data. I note from similar questions that timevis might give the sort of chart I am looking for. But it appears that timevis data frames are entered into the code, whereas I wish to pull in a csv data file, with the grouping and counting as an added requirement.
OK, to clarify here is a mock up from a spreadsheet. Only the blue type is entered; the other types would be differing shades of red, green etc..
And @Jon Spring's suggestion here, annotated with counts - so less rows would show
The grouping and counting will reduce the number of rows displayed - it would not be good to have 650 rows on a chart.
Upvotes: 1
Views: 72
Reputation: 11
Based on the code JonSpring submitted, it became apparent I could make things easier by changing, and simplifying, the data file. The datafile was formatted in a spreadsheet to include counts, reduce the number of rows, and calculate percentages (percentages per species, not total). This is made up data, but represents the periods that fossilised or preserved remains from these species date from, and the number of sites where those remains of those ages have been found.
"Species","RangeStart","RangeEnd","Num","PerCent"
"Reindeer",6000,0,9,100.0
"Megalocerus",6000,0,13,56.6
"Megalocerus",9000,0,5,21.7
"Megalocerus",11700,0,2,8.7
"Megalocerus",16000,9000,2,8.7
"Megalocerus",16000,11700,1,4.3
"Wild Boar",6000,0,70,92.1
"Wild Boar",9000,0,2,2.6
"Wild Boar",11700,0,4,5.3
"Wolf",6000,0,257,48.0
"Wolf",9000,0,104,19.4
"Wolf",9000,6000,4,0.7
"Wolf",11700,0,85,15.9
"Wolf",11700,6000,4,0.7
"Wolf",11700,9000,2,0.4
"Wolf",12900,0,13,2.4
"Wolf",12900,6000,2,0.4
"Wolf",12900,9000,6,1.1
"Wolf",16000,0,28,5.2
"Wolf",16000,9000,26,4.8
"Wolf",16000,11700,3,0.6
"Wolf",16000,12900,2,0.4
"Red Deer",6000,0,4,100.0
"Black Bear",6000,0,22,84.6
"Black Bear",9000,0,3,11.5
"Black Bear",11700,0,1,3.9
Following on from that, the following code gave me pretty much what I was looking for.
df<-read.csv("FilePath/Filename.csv",stringsAsFactors = F,check.names = F)
df_transformed <- df |>
group_by(Species) |>
mutate(is_first = row_number() == 1) |>
ungroup()
df_transformed2 <- df_transformed |>
mutate(row = row_number(), .keep="all") |>
mutate(Age = (RangeStart+RangeEnd)/2, .keep="all") |>
mutate(Width = RangeStart-RangeEnd, .keep="all") |>
group_by(Age)
ggplot(df_transformed2, aes(x=Age, y=row, width=Width, fill = Species)) +
annotate("rect", xmin=12900, xmax=16000, ymin=-Inf, ymax=Inf, alpha = .2) +
annotate("text", x = (12900+ 16000)/2, y=0, label="Postglacial", color="black",size=3,fontface="bold") +
annotate("text", x = (12900+11700)/2, y=0, label="PG/Hol", color="black",size=3,fontface="bold") +
annotate("text", x = (12900+11700)/2, y=-0.5, label="Transition", color="black",size=3,fontface="bold") +
annotate("rect", xmin=9000, xmax=11700, ymin=-Inf, ymax=Inf, alpha = .2) +
annotate("text", x = (9000+11700)/2, y=0, label="Early Holocene", color="black",size=3,fontface="bold") +
annotate("text", x = (9000+6000)/2, y=0, label="Mid Holocene", color="black",size=3,fontface="bold") +
annotate("rect", xmin=250, xmax=6000, ymin=-Inf, ymax=Inf, alpha = .2) +
annotate("text", x = (6000+250)/2, y=0, label="Late Holocene", color="black",size=3,fontface="bold") +
annotate("text", x = (250+0)/2, y=0, label="Industrial", color="black",size=3,fontface="bold") +
annotate("rect", xmin=0, xmax=50, ymin=-Inf, ymax=Inf, alpha = .2) +
annotate("text", x = (50+0)/2, y=-0.5, label="Modern", color="black", size = 3,fontface="bold") +
geom_tile(aes(fill=Species),color = "black", alpha=0.7) +
theme(legend.position = "none") +
scale_x_reverse(breaks=c(16000,12900,11700,9000,6000,4000,2000,0)) +
geom_text( aes(x = 2000, label = paste0(" ", Num)), hjust =1, size = 3, fontface = "bold", color="black") +
geom_text( aes(x = 1000, label = paste0(" (", PerCent, "%) ")), hjust =1, size = 3, fontface = "bold", color="black") +
geom_text(data = filter(df_transformed2, is_first), aes(x = 5000, label=Species), hjust =0, size = 3, fontface = "bold", color="black") +
scale_y_continuous(breaks = NULL) +
xlab("Periods spanned / Years BP") +
ylab(NULL)
I can still tweak it, but my level of understanding has been massively improved. The output follows, and I am very happy with that as a result:-
Upvotes: 0
Reputation: 66880
Quick proof of concept using ggplot, dplyr, and tidyr by loading library(tidyverse)
.
YOUR_DATA |>
mutate(row = row_number()) |>
pivot_longer(V1:V7) |> # loaded as V1 since `1` is not a syntactic column name
filter(value > 0) |>
ggplot(aes(value, row, fill = Type)) +
geom_tile()
...and a variation where I join the data to a table with the era widths, so I can show the first period longer:
YOUR_DATA |>
mutate(row = row_number()) |>
pivot_longer(V1:V7) |>
filter(value > 0) |>
left_join(data.frame(name = paste0("V",1:7),
from = c(8000, 6000, 5000, 4000, 3000, 2000, 1000)) |>
# wrangling to simplify geom_tile; could alternately use geom_rect
mutate(to = lead(from, default = 0),
mid = (from + to) / 2,
width = from - to)) |>
ggplot(aes(mid, row, width = width, fill = Type)) +
geom_tile(color = "black") +
scale_x_reverse()
Upvotes: 3