Reputation: 309
Any idea how to add labels directly to to my plot (geom_text) ?
Here is my sample dataframe, I am plotting three curves ( confirmed, deaths, recovered) but how to add there also colname labels ? I read dataframe from csv file.
print (data)
date confirmed deaths recovered
1 2020-12-01 63883985 1481306 41034934
2 2020-12-02 64530517 1493742 41496318
3 2020-12-03 65221040 1506260 41932091
4 2020-12-04 65899441 1518670 42352021
5 2020-12-05 66540034 1528868 42789879
6 2020-12-06 67073728 1536056 43103827
Here is my code:
data <- structure(list(date = structure(1:6, .Label = c("2020-12-01",
"2020-12-02", "2020-12-03", "2020-12-04", "2020-12-05", "2020-12-06"
), class = "factor"), confirmed = c(63883985L, 64530517L, 65221040L,
65899441L, 66540034L, 67073728L), deaths = c(1481306L, 1493742L,
1506260L, 1518670L, 1528868L, 1536056L), recovered = c(41034934L,
41496318L, 41932091L, 42352021L, 42789879L, 43103827L)), row.names = c(NA,
6L), class = "data.frame")
ggplot(data, aes(x = date, y = confirmed, group=1 ) ) +
geom_line(colour = "blue", size =1, aes(date, confirmed)) +
scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6)) +
geom_line(color = "red", size = 1, aes(date, deaths)) +
geom_line(color = "#1EACB0", size = 1, aes(date, recovered))
Here is my current plot without labels,
I tried also ggplot with this code label=colnames(stats_data)
, but not working this way,
Upvotes: 0
Views: 1411
Reputation: 5204
As mentioned in the post linked by Roman, ggrepel
is a good option for this. Note you can adjust where you want the label to fall using the variable lab_date
I created.
# load packages
library(tidyverse)
library(scales)
library(ggrepel)
# process data for plotting
data1 <- data %>%
mutate(date = as.Date(date)) %>%
pivot_longer(cols = -date, names_to = "category", values_to = "cases") %>%
mutate(category = factor(category))
# set color scheme with named vector
color_scheme <- setNames(c("blue", "red", "#1EACB0"), unique(data1$category))
# determine position of label
lab_date <- data1$date %>%
as.numeric(.) %>% # convert to numeric for finding desired potition
quantile(., 0.5) %>% # selects middle of range but you can adjust as needed
as.Date(., origin = "1970-01-01") %>% # convert back to date
as.character() # convert to string for matching in geom_label_repel call
# plot lines with labels and drop legend
data1 %>%
ggplot(data = ., aes(x = date, y = cases, color = category)) +
geom_line() +
geom_label_repel(aes(label = category),
data = data1 %>% filter(date == lab_date)) +
scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6)) +
scale_color_manual(values = color_scheme) +
theme(legend.position = "none")
Gives the following plot:
A few notes with updates:
scale_color_manual
which will preserve the color scheme even if the order of the categories changes or one is absent.lab_date <- "2020-12-03"
or whatever you needed.geom_label
instead of geom_label_repel
gives almost the exact same result so might be considered gratuitous for this relatively small number of labels, although it does help to get the label off the line if that's important.plotly::ggplotly
doesn't support ggrepel
or even ggplot2::geom_label
. Therefore, if you need this to go into plotly, one option is to change geom_label_repel
to geom_text
although then it will plot on top of the line if you don't adjust the y position. See below:ggplotly(
data1 %>%
ggplot(data = ., aes(x = date, y = cases, color = category)) +
geom_line() +
geom_text(aes(label = category),
data = data1 %>%
filter(date == lab_date) %>%
mutate(cases = cases + 2e6)) + # this adjusts the y position of the label to avoid overplotting on the line
scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6)) +
scale_color_manual(values = color_scheme) +
theme(legend.position = "none")
)
Produces this plot:
The amount you want to adjust by will depend on line thickness, specific values your your data and size of your plot so it's more of a hack than a robust solution.
Upvotes: 2
Reputation: 76402
This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.
library(dplyr)
library(tidyr)
library(ggplot2)
stats_data %>%
select(-starts_with("diff")) %>%
pivot_longer(-date, names_to = "cases", values_to = "count") %>%
mutate(cases = factor(cases, levels = c("confirmed", "deaths", "recovered"))) %>%
ggplot(aes(date, count, colour = cases)) +
geom_line() +
scale_color_manual(values = c("blue", "red", "#1EACB0"))
Data
stats_data <- read.table(text = "
date confirmed diff.x deaths diff.y recovered
'2020-01-22' 555 555 17 17 28
'2020-01-23' 654 99 18 1 30
'2020-01-24' 941 287 26 8 36
'2020-01-25' 1434 493 42 16 39
'2020-01-26' 2118 684 56 14 52
'2020-01-27' 2927 809 82 26 61
", header = TRUE, colClasses = c("Date", rep("numeric", 5)))
Upvotes: 1