Reputation: 123
Here is the .tsv file for the following script.
Source code:
library(ggplot2)
library(dplyr)
library(stringr)
adder_dd_with_yyyy_mm <- function(date) {
if(str_count(date, "-") == 1) {
return(paste(date, "01", sep = "-"))
} else {
return(date)
}
}
date_abbreviator <- function(date) {
month <- strftime(date, format = "%b")
year <- strftime(date, format = "%y")
return(paste(month, year, sep = "-"))
}
df <- read.csv("figure_metadata.tsv", sep = "\t")
df$date <- df$date %>% lapply(adder_dd_with_yyyy_mm) %>%
lapply(date_abbreviator)
ggplot(data = df) +
geom_bar(mapping = aes(x = date, fill = clade))
It's sure that no missing value in date
and clade
columns.
I'm getting the error Error in abs(x) : non-numeric argument to mathematical function
. Can anyone help, please? Thanks.
Upvotes: 0
Views: 1407
Reputation: 160687
The results of your df$date
reassignment is a list
, not a vector. This can be fixed in a number of ways:
df$date <- df$date %>%
lapply(adder_dd_with_yyyy_mm) %>%
sapply(date_abbreviator)
## or
df$date <- df$date %>%
lapply(adder_dd_with_yyyy_mm) %>%
lapply(date_abbreviator) %>%
unlist(.)
Both of those techniques run a slight risk (without looking more in-depth at your processing) that if any function returns an object of length other than 1, reassignment may fail. However, if all of that goes well, then
ggplot(data = df) +
geom_bar(mapping = aes(x = date, fill = clade))
I suggest that since your x-axis is a categorical, the labels will always be bunched up. It might be better to treat them as a real Date
-class object.
This process is a bit slow, mostly because you are operating on one $date
at a time; R is good at doing things whole vectors at a time, so your labeling might do better with something like:
func1 <- function(x) {
paste0(x, ifelse(str_count(x, "-") == 1, "-01", ""))
}
func1(head(df$date))
# [1] "2020-04-09" "2020-04-08" "2020-04-20" "2020-06-18" "2020-06-18" "2020-03-13"
to do all of the first function in one step. The second step can equally be vectorized,
func2 <- function(x) {
format(as.Date(x), format = "%b-%y")
}
func2(func1(head(df$date)))
# [1] "Apr-20" "Apr-20" "Apr-20" "Jun-20" "Jun-20" "Mar-20"
These could likely be combined into a single function:
func <- function(x) {
format(as.Date(paste0(x, ifelse(str_count(x, "-") == 1, "-01", ""))),
format = "%b-%y")
}
func(head(df$date))
# [1] "Apr-20" "Apr-20" "Apr-20" "Jun-20" "Jun-20" "Mar-20"
Ultimately, though, I generally prefer and recommend keeping date-like objects as Date
-class objects and having ggplot2
do some formatting.
# df <- read.csv(...)
df$date <- as.Date(paste0(df$date, ifelse(nchar(df$date) < 10, "-01", "")))
head(df$date)
# [1] "2020-04-09" "2020-04-08" "2020-04-20" "2020-06-18" "2020-06-18" "2020-03-13"
lubridate::floor_date(head(df$date), unit="months")
# [1] "2020-04-01" "2020-04-01" "2020-04-01" "2020-06-01" "2020-06-01" "2020-03-01"
df$date <- as.Date(lubridate::floor_date(df$date, unit="months"))
head(df$date)
# [1] "2020-04-01" "2020-04-01" "2020-04-01" "2020-06-01" "2020-06-01" "2020-03-01"
class(df$date)
# [1] "Date"
ggplot(data = df) +
geom_bar(mapping = aes(x = date, fill = clade)) +
scale_x_date(labels = function(z) format(z, format = "%b-%y"))
If you need all of the axis ticks, then (as above) it will get busy. We can control which are shown using breaks=
, and then optionally rotate the labels.
ggplot(data = df) +
geom_bar(mapping = aes(x = date, fill = clade)) +
scale_x_date(
breaks = function(z) seq(lubridate::floor_date(z[1], "month"), lubridate::ceiling_date(z[2], "month"), by = "month"),
labels = function(z) format(z, format = "%b-%y")
) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
*Edited to fix a breaks=
bug. Note that Jan-20
and Aug-21
are included mostly because most plot functions expand the axes a little when working with numerical data.
Upvotes: 2