Reputation: 119
I have a dataset that I am trying to create a grouped bar chart. The groups are before and after timepoints, and the subgroups are 6 different devices. The before only has 3 of the 6 subgroups, while the after has all 6 of the subgroups. I am having two main issues: getting the code to arrange the subgroups in ascending order, and having the "Before" group on the left of the plot compared to the "After" group.
Below is the code that I have for dataset FDA_co_tier
:
library(tidyverse)
library(ggplot2)
F_dev_dec <- FDA_co_tier%>%
select(device_type, year_approved) %>%
mutate(year_2015 = if_else(year_approved >= 2015, "after", "before")) %>%
mutate(year_2015 = factor(year_2015, levels=c("before", "after"))) %>%
group_by(device_type, year_2015) %>%
summarise(N=n()) %>%
mutate(N= factor(N, levels = N))
which gives me a table of:
`summarise()` has grouped output by 'device_type'. You can override using the `.groups` argument.
# A tibble: 9 x 3
# Groups: device_type [6]
device_type year_2015 N
<chr> <fct> <fct>
1 Accessories after 6
2 Aspiration_catheter before 4
3 Aspiration_catheter after 32
4 Guidewire before 3
5 Guidewire after 23
6 Microcatheter after 7
7 Sheath after 19
8 Stentretriever before 17
9 Stentretriever after 22
I made both the year_2015 variable and the N variable as factors, and tried to set the year_2015 levels with "before" then "after".
As mentioned above, I want to plot the different devices in Bars for the "Before" and "After" points in ascending order of n. To do this, I initially tried:
F_dev_dec %>%
ggplot(aes(x= year_2015, fill= device_type)) +
geom_bar(position = position_dodge()) +
labs(title = NULL,
x= NULL,
y= "Count (n)")
which gave me this graph: [1]: https://i.sstatic.net/NAM8i.png
Not sure exactly why this has the correct groups ("Before", "After" but all the device subgroups go to 1, like its calculating some odd percentage. I
I next tried to adjust the code and not summarize and not double group the data table by device and year_2015:
F_dev_dec <- FDA_co_tier%>%
select(device_type, year_approved) %>%
mutate(year_2015 = if_else(year_approved >= 2015, "after", "before")) %>%
mutate(year_2015 = factor(year_2015, levels=c("before", "after"))) %>%
group_by(device_type, year_2015)
then using the same code for the ggplot above, I get something a little closer, but not in ascending order of n for the subgroups : https://i.sstatic.net/wSm5L.png
After further attempts of trial and error not getting this, the only way Ive been able to get the plot to look like I want was with this code:
F_dev_dec <- FDA_co_tier%>%
select(device_type, year_approved) %>%
mutate(year_2015 = if_else(year_approved >= 2015, "before", "after")) %>%
mutate(year_2015 = factor(year_2015, levels=c("before", "after"))) %>%
group_by(device_type, year_2015) %>%
summarise(N=n())
`summarise()` has grouped output by 'device_type'. You can override using the `.groups` argument.
# A tibble: 9 x 3
# Groups: device_type [6]
device_type year_2015 N
<chr> <fct> <int>
1 Accessories after 6
2 Aspiration_catheter before 4
3 Aspiration_catheter after 32
4 Guidewire before 3
5 Guidewire after 23
6 Microcatheter after 7
7 Sheath after 19
8 Stentretriever before 17
9 Stentretriever after 22
F_dev_dec %>%
ggplot(aes(x= year_2015, y= N, fill= reorder(device_type, N))) +
geom_bar(data=F_dev_dec %>% filter(year_2015 == "before"), width=0.9, position = position_dodge(), stat = "identity") +
geom_bar(data=F_dev_dec %>% filter(year_2015 == "after"), width=0.5, position = position_dodge(), stat = "identity") +
scale_x_discrete(labels = c("Before", "After")) +
theme_classic()+
labs(title = NULL,
x= NULL,
y= "Count (n)")
Which gives the Plot im looking for: https://i.sstatic.net/KHfzk.png
Of note: in the code immediately above, the mutation of FDA_co_tier to generate the variable year_2015 keeps messing up the final plot. When I attempt do the mutation correctly with mutate(year_2015 = if_else(year_approved >= 2015, "after", "before"))
, the final plot looks like this: https://i.sstatic.net/G4vyU.png
The data table gets reversed (values that should be "before" are labeled as "after" 2015 and values "after" are labeled as "before"). Its more annoying that anything as the groups all still contain the same values, but the labels are wrong, so I had to write in the scale_x_discrete(labels = c("Before", "After"))
line to correct the plot.
Any advice on how to fix the issue with getting the subgroups in correct ascending number and whats going on with the code to generate the plot would be much appreciated!
Upvotes: 1
Views: 1100
Reputation: 4949
You need to set the appropriate levels for the device_type
variable. The easiest way to do it is this way.
library(tidyverse)
F_dev_dec = read.table(
header = TRUE,text="
device_type year_2015 N
Accessories after 6
Aspiration_catheter before 4
Aspiration_catheter after 32
Guidewire before 3
Guidewire after 23
Microcatheter after 7
Sheath after 19
Stentretriever before 17
Stentretriever after 22
") %>% as_tibble()
F_dev_dec = F_dev_dec %>%
group_by(year_2015) %>%
arrange(N) %>%
mutate(
year_2015 = year_2015 %>% factor(c("before", "after")),
device_type = device_type %>% fct_inorder()
)
F_dev_dec %>%
ggplot(aes(x= year_2015, y= N, fill= reorder(device_type, N))) +
geom_bar(data=F_dev_dec %>% filter(year_2015 == "before"), width=0.9, position = position_dodge(), stat = "identity") +
geom_bar(data=F_dev_dec %>% filter(year_2015 == "after"), width=0.5, position = position_dodge(), stat = "identity") +
scale_x_discrete(labels = c("Before", "After")) +
theme_classic()+
labs(title = NULL,
x= NULL,
y= "Count (n)")
Note the following order of commands first arrange(N)
and then device_type = device_type %>% fct_inorder ()
.
P.S.
You may have shared your FDA_co_tier
data table with us. Without it, I had to read the summary using read.table
.
Update 1
Phew! I got a bit tired to get the effect you expect. But in the end it worked. Let's see what we have here. First, I will work on your completed data. I made them the same for myself.
library(tidyverse)
FDA_co_tier = tibble(
device_type = c(rep("Accessories", 6),
rep("Aspiration_catheter", 36),
rep("Guidewire", 26),
rep("Microcatheter", 7),
rep("Sheath", 19),
rep("Stentretriever", 39)),
year_2015 = c(rep("after", 6),
rep("before", 4),
rep("after", 32),
rep("before", 3),
rep("after", 49),
rep("before", 17),
rep("after", 22)))
output
# A tibble: 133 x 2
device_type year_2015
<chr> <chr>
1 Accessories after
2 Accessories after
3 Accessories after
4 Accessories after
5 Accessories after
6 Accessories after
7 Aspiration_catheter before
8 Aspiration_catheter before
9 Aspiration_catheter before
10 Aspiration_catheter before
# ... with 123 more rows
Now let's create a plot. Note one clever line of code geom_bar (position = position_dodge (), alpha = 0)
), it does not display anything, it only sets the expected order of the variable year_2015
.
FDA_co_tier %>%
mutate(
year_2015 = year_2015 %>% factor(c("before", "after"))) %>%
ggplot(aes(year_2015, fill=device_type))+
geom_bar(position = position_dodge(), alpha=0)+
geom_bar(data = . %>%
filter(year_2015 == "before") %>%
mutate(device_type = device_type %>% fct_infreq() %>% fct_rev()),
position = position_dodge())+
geom_bar(data = . %>%
filter(year_2015 == "after") %>%
mutate(device_type = device_type %>% fct_infreq() %>% fct_rev()),
position = position_dodge())
Upvotes: 2