Rajan Raju
Rajan Raju

Reputation: 123

why getting Error in abs(x) : non-numeric argument to mathematical function in ggplot?

Here is the .tsv file for the following script.

Source code:

library(ggplot2)
library(dplyr)
library(stringr)


adder_dd_with_yyyy_mm <- function(date) {
  if(str_count(date, "-") == 1) {
    return(paste(date, "01", sep = "-"))
  } else {
    return(date)
  }
}

date_abbreviator <- function(date) {
  month <- strftime(date, format = "%b")
  year <- strftime(date, format = "%y")
  return(paste(month, year, sep = "-"))
}


df <- read.csv("figure_metadata.tsv", sep = "\t")

df$date <- df$date %>% lapply(adder_dd_with_yyyy_mm) %>%
  lapply(date_abbreviator)


ggplot(data = df) +
  geom_bar(mapping = aes(x = date, fill = clade))


It's sure that no missing value in date and clade columns. I'm getting the error Error in abs(x) : non-numeric argument to mathematical function. Can anyone help, please? Thanks.

Upvotes: 0

Views: 1407

Answers (1)

r2evans
r2evans

Reputation: 160687

The results of your df$date reassignment is a list, not a vector. This can be fixed in a number of ways:

df$date <- df$date %>%
  lapply(adder_dd_with_yyyy_mm) %>%
  sapply(date_abbreviator)
## or
df$date <- df$date %>%
  lapply(adder_dd_with_yyyy_mm) %>%
  lapply(date_abbreviator) %>%
  unlist(.)

Both of those techniques run a slight risk (without looking more in-depth at your processing) that if any function returns an object of length other than 1, reassignment may fail. However, if all of that goes well, then

ggplot(data = df) +
  geom_bar(mapping = aes(x = date, fill = clade))

enter image description here

I suggest that since your x-axis is a categorical, the labels will always be bunched up. It might be better to treat them as a real Date-class object.


This process is a bit slow, mostly because you are operating on one $date at a time; R is good at doing things whole vectors at a time, so your labeling might do better with something like:

func1 <- function(x) {
  paste0(x, ifelse(str_count(x, "-") == 1, "-01", ""))
}
func1(head(df$date))
# [1] "2020-04-09" "2020-04-08" "2020-04-20" "2020-06-18" "2020-06-18" "2020-03-13"

to do all of the first function in one step. The second step can equally be vectorized,

func2 <- function(x) {
  format(as.Date(x), format = "%b-%y")
}
func2(func1(head(df$date)))
# [1] "Apr-20" "Apr-20" "Apr-20" "Jun-20" "Jun-20" "Mar-20"

These could likely be combined into a single function:

func <- function(x) {
  format(as.Date(paste0(x, ifelse(str_count(x, "-") == 1, "-01", ""))), 
         format = "%b-%y")
}
func(head(df$date))
# [1] "Apr-20" "Apr-20" "Apr-20" "Jun-20" "Jun-20" "Mar-20"

Ultimately, though, I generally prefer and recommend keeping date-like objects as Date-class objects and having ggplot2 do some formatting.

# df <- read.csv(...)
df$date <- as.Date(paste0(df$date, ifelse(nchar(df$date) < 10, "-01", "")))

head(df$date)
# [1] "2020-04-09" "2020-04-08" "2020-04-20" "2020-06-18" "2020-06-18" "2020-03-13"
lubridate::floor_date(head(df$date), unit="months")
# [1] "2020-04-01" "2020-04-01" "2020-04-01" "2020-06-01" "2020-06-01" "2020-03-01"

df$date <- as.Date(lubridate::floor_date(df$date, unit="months"))
head(df$date)
# [1] "2020-04-01" "2020-04-01" "2020-04-01" "2020-06-01" "2020-06-01" "2020-03-01"
class(df$date)
# [1] "Date"

ggplot(data = df) +
  geom_bar(mapping = aes(x = date, fill = clade)) +
  scale_x_date(labels = function(z) format(z, format = "%b-%y"))

ggplot2, with date-class x-axis

If you need all of the axis ticks, then (as above) it will get busy. We can control which are shown using breaks=, and then optionally rotate the labels.

ggplot(data = df) +
  geom_bar(mapping = aes(x = date, fill = clade)) +
  scale_x_date(
    breaks = function(z) seq(lubridate::floor_date(z[1], "month"), lubridate::ceiling_date(z[2], "month"), by = "month"),
    labels = function(z) format(z, format = "%b-%y")
  ) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

ggplot with all x-axis ticks/labels, rotated

*Edited to fix a breaks= bug. Note that Jan-20 and Aug-21 are included mostly because most plot functions expand the axes a little when working with numerical data.

Upvotes: 2

Related Questions