Kishan J
Kishan J

Reputation: 33

How can I color a line graph by grouping the variables in R?

I have produced a line graph something that looks like this

Generated using ggplot2

I have the data set of 50 countries and its GDP for last 10 years.
Sample data:

Country variable value  
China   Y2007    3.55218e+12
USA     Y2007    1.45000e+13
Japan   Y2007    4.51526e+12
UK      Y2007    3.06301e+12
Russia  Y2007    1.29971e+12 
Canada  Y2007    1.46498e+12
Germany Y2007    3.43995e+12 
India   Y2007    1.20107e+12
France  Y2007    2.66311e+12
SKorea  Y2007    1.12268e+12

I generated the line graph using the code

GDP_lineplot = ggplot(data=GDP_linechart, aes(x=variable,y=value)) + 
  geom_line() + 
  scale_y_continuous(name = "GDP(USD in Trillions)", 
                     breaks = c(0.0e+00,5.0e+12,1.0e+13,1.5e+13), 
                     labels = c(0,5,10,15)) + 
  scale_x_discrete(name = "Years", labels = c(2007,"",2009,"",2011,"",2013,"",2015))

The idea is to make the graph look like this. How can I plot the colors

I tried adding

group=country, color = country

It outputs coloring all the countries.

How can I color the countries with top 4 and the rest?

PS: I am still naive with R.

Upvotes: 3

Views: 8089

Answers (2)

Uwe
Uwe

Reputation: 42592

By plotting subsets, the other groups aren't included in the colour legend on the right. The alternative approach below manipulates factor levels and uses a customized color scale to overcome this.

Preparing data

It is assumed that GDP_long contains the data in long format. This is in line with the data shown by the OP (GDP_lineplot, but see Data section below for differences). To manipulate factor levels, the forcatspackage is used (and data.table).

library(data.table)
library(forcats)
# coerce to data.table, reorder factors by values in last = most actual year
setDT(GDP_long)[, Country := fct_reorder(Country, -value, last)]
# create new factor which collapses all countries to "Other" except the top 4 countries
GDP_long[, top_country := fct_other(Country, keep = head(levels(Country), 4))]

Create plot

library(ggplot2)
ggplot(GDP_long, aes(Year, value/1e12, group = Country, colour = top_country)) + 
  geom_point() + geom_line(size = 1) + theme_bw() + ylab("GDP(USD in Trillions)") +
  scale_colour_manual(name = "Country", 
                      values = c("green3", "orange", "blue", "red", "grey"))

enter image description here

The chart is now quite similar to the expected result. The lines of the top 4 countries are displayed in different colours while the other countries are displayed in grey but do appear in the colour legend to the right.

Note that the groupaesthetic is still needed so that a single line is plotted for each country while colour is controlled by the levels of top_country.

Data

The data set is too large to be reproduced here (even with dput()). The structure

str(GDP_long)
'data.frame':   1763 obs. of  3 variables:
 $ Country: chr  "Afghanistan" "Albania" "Algeria" "Andorra" ...
 $ Year   : int  2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
 $ value  : num  9.84e+09 1.07e+10 1.35e+11 4.01e+09 6.04e+10 ...

is similar to OP's data with the exception that the variable column already is converted to an integer column year. This will give a nicely formatted x-axis without additional effort.

Upvotes: 3

emilliman5
emilliman5

Reputation: 5966

My apologies I missed the part about only coloring a subset of the countries... in the geom_line calls you can add the subsetting that suits your needs.

df <- data.frame(Country=rep(LETTERS[1:10], each=5),
    Year=rep(2007:2011, length.out=10), 
    value=rnorm(50))

ggplot(df) +
 geom_line(data=df[21:50, ], aes(x=Year, y=value, group=Country), color="#999999") +
 geom_line(data=df[1:20, ], aes(Year, y=value, color=Country))

enter image description here

Upvotes: 1

Related Questions