Reputation: 87
Above is my dataset, just a simple dataset. It shows the GDP per capita of the richest and the poorest regions in nine countries in 2000 and 2015 as well as the gap of GDP per capita between the poorest and richest regions. Below is the reproducible example of this dataset:
structure(list(Country = c("Britain", "Germany", "United State",
"France", "South Korea", "Italy", "Japan", "Spain", "Sweden"),
Poor2000 = c(69, 50, 74, 52, 79, 50, 80, 80, 90), Poor2015 = c(61,
48, 73, 50, 73, 52, 78, 84, 82), Rich2000 = c(848, 311, 290,
270, 212, 180, 294, 143, 148), Rich2015 = c(1150, 391, 310,
299, 200, 198, 290, 151, 149)), row.names = c(NA, -9L), class = c("tbl_df",
"tbl", "data.frame"))
I wanna make a plot like this:
In this plot I just wanna show the GDP per capita of the poorest regions in the nine countries in 2000 and 2015 (the draft picture just has three countries for the sake of convenience). But I don't know how to do it using ggplot. Because it seems like I need to set x-axis as "Country" and y-axis as "Poor2000" and "Poor2015" the two variables. I don't know how to do that. Thanks many in advance.
Upvotes: 1
Views: 69
Reputation: 16178
Here a possible solution. Starting from your dataframe, you can first create a new dataframe that will reshape it into a longer format. FOr doing that, I used pivot_longer
function from tidyr
package:
library(tidyr)
library(dplyr)
DF <- df %>% select(Country, Poor2000, Poor2015) %>%
mutate(Diff = Poor2015 - Poor2000) %>%
pivot_longer(-Country, names_to = "Poor", values_to = "value")
# A tibble: 27 x 3
Country Poor value
<fct> <chr> <dbl>
1 Britain Poor2000 69
2 Britain Poor2015 61
3 Britain Diff -8
4 Germany Poor2000 50
5 Germany Poor2015 48
6 Germany Diff -2
7 United States Poor2000 74
8 United States Poor2015 73
9 United States Diff -1
10 France Poor2000 52
# … with 17 more rows
We will also create a second dataframe that will contain the difference of values between Poor2000
and Poor2015
:
DF_second_label <- df %>% select(Country, Poor2000, Poor2015) %>%
group_by(Country) %>%
mutate(Diff = Poor2015 - Poor2000, ypos = max(Poor2000,Poor2015))
# A tibble: 9 x 5
# Groups: Country [9]
Country Poor2000 Poor2015 Diff ypos
<fct> <dbl> <dbl> <dbl> <dbl>
1 Britain 69 61 -8 69
2 Germany 50 48 -2 50
3 United States 74 73 -1 74
4 France 52 50 -2 52
5 South Korea 79 73 -6 79
6 Italy 50 52 2 52
7 Japan 80 78 -2 80
8 Spain 80 84 4 84
9 Sweden 90 82 -8 90
Then, we can plot both new dataframe in ggplot2
and select only countries of interest by using subset
function:
ggplot(subset(DF, Poor != "Diff" & Country %in% c("Britain","South Korea","Sweden")),
aes(x = Country, y = value, fill = Poor))+
geom_col(position = position_dodge())+
geom_text(aes(label = value), position = position_dodge(0.9), vjust = -0.5, show.legend = FALSE)+
geom_text(inherit.aes = FALSE,
data = subset(DF_second_label, Country %in% c("Britain","South Korea","Sweden")),
aes(x = Country,
y = ypos+10,
label = Diff), color = "darkgreen", size = 6, show.legend = FALSE)+
labs(x = "", y = "GDP per Person", title = "Poor in 2000 & 2015")+
theme(plot.title = element_text(hjust = 0.5))
And you get:
Reproducible example
df <- data.frame(Country = c("Britain","Germany", "United States", "France", "South Korea", "Italy","Japan","Spain","Sweden"),
Poor2000 = c(69,50,74,52,79,50,80,80,90),
Poor2015 = c(61,48,73,50,73,52,78,84,82),
Rich2000 = c(848,311,290,270,212,180,294,143,148))
Upvotes: 3