Reputation: 33
I have produced a line graph something that looks like this
I have the data set of 50 countries and its GDP for last 10 years.
Sample data:
Country variable value
China Y2007 3.55218e+12
USA Y2007 1.45000e+13
Japan Y2007 4.51526e+12
UK Y2007 3.06301e+12
Russia Y2007 1.29971e+12
Canada Y2007 1.46498e+12
Germany Y2007 3.43995e+12
India Y2007 1.20107e+12
France Y2007 2.66311e+12
SKorea Y2007 1.12268e+12
I generated the line graph using the code
GDP_lineplot = ggplot(data=GDP_linechart, aes(x=variable,y=value)) +
geom_line() +
scale_y_continuous(name = "GDP(USD in Trillions)",
breaks = c(0.0e+00,5.0e+12,1.0e+13,1.5e+13),
labels = c(0,5,10,15)) +
scale_x_discrete(name = "Years", labels = c(2007,"",2009,"",2011,"",2013,"",2015))
The idea is to make the graph look like this.
I tried adding
group=country, color = country
It outputs coloring all the countries.
How can I color the countries with top 4 and the rest?
PS: I am still naive with R.
Upvotes: 3
Views: 8089
Reputation: 42592
By plotting subsets, the other groups aren't included in the colour legend on the right. The alternative approach below manipulates factor levels and uses a customized color scale to overcome this.
It is assumed that GDP_long
contains the data in long format. This is in line with the data shown by the OP (GDP_lineplot
, but see Data section below for differences). To manipulate factor levels, the forcats
package is used (and data.table
).
library(data.table)
library(forcats)
# coerce to data.table, reorder factors by values in last = most actual year
setDT(GDP_long)[, Country := fct_reorder(Country, -value, last)]
# create new factor which collapses all countries to "Other" except the top 4 countries
GDP_long[, top_country := fct_other(Country, keep = head(levels(Country), 4))]
library(ggplot2)
ggplot(GDP_long, aes(Year, value/1e12, group = Country, colour = top_country)) +
geom_point() + geom_line(size = 1) + theme_bw() + ylab("GDP(USD in Trillions)") +
scale_colour_manual(name = "Country",
values = c("green3", "orange", "blue", "red", "grey"))
The chart is now quite similar to the expected result. The lines of the top 4 countries are displayed in different colours while the other countries are displayed in grey but do appear in the colour legend to the right.
Note that the group
aesthetic is still needed so that a single line is plotted for each country while colour
is controlled by the levels of top_country
.
The data set is too large to be reproduced here (even with dput()
). The structure
str(GDP_long)
'data.frame': 1763 obs. of 3 variables:
$ Country: chr "Afghanistan" "Albania" "Algeria" "Andorra" ...
$ Year : int 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
$ value : num 9.84e+09 1.07e+10 1.35e+11 4.01e+09 6.04e+10 ...
is similar to OP's data with the exception that the variable
column already is converted to an integer column year
. This will give a nicely formatted x-axis without additional effort.
Upvotes: 3
Reputation: 5966
My apologies I missed the part about only coloring a subset of the countries... in the geom_line calls you can add the subsetting that suits your needs.
df <- data.frame(Country=rep(LETTERS[1:10], each=5),
Year=rep(2007:2011, length.out=10),
value=rnorm(50))
ggplot(df) +
geom_line(data=df[21:50, ], aes(x=Year, y=value, group=Country), color="#999999") +
geom_line(data=df[1:20, ], aes(Year, y=value, color=Country))
Upvotes: 1