Reputation: 223
I would like to plot my data on a logarithmic scale, since a lot of the data points are near zero, but some are also around 1. So everything near zero looks very crowded. The problem is, that I have positive and negative values and would like to be able to plot this. So I want a negative logarithmic scale, 0, and a positive logarithmic scale.
So I would like to have ticks on the y axis like: -10, -1, -0.1, -0.01, 0, 0.01, 0.1, 1, 10
Is this somehow possible? Problem of course is, that one must know when to "stop". So 0.01 is small enough and that 0.00001 is not necessary anymore. So probably some sort of break is necessary. This thought came to me after posting the questiion the first time, so maybe my whole approach is not possible. If that's the case I would be happy about any alternative suggestions.
Hope this introduction is not too confusing. In the following a reproducible example with normally distributed data:
(Of course there is no "crowding" near 0 here, but the problem are negative values (runif(10)
works without any issues)):
library(ggplot2)
df <- data.frame(x=1:10, y=rnorm(10))
p <- ggplot(data=df, aes(x=x, y=y)) + geom_point()
To obtain a logarithmic scale on the positive and the negative side of the y axis, I defined a log_pm
function ("log plus minus") and its inverse to make use of scales::trans_new
to obtain a new transformation for ggplot.
This idea is from How to get a reversed, log10 scale in ggplot2?, but I tried to change it for my needs.
log_pm <- function(x, base = exp(1)){
ifelse(x < 0, -log((-x)+1, base),
ifelse(x == 0, 0, log(x+1, base)))
}
log_pm_inv <- function(x, base = exp(1)){
ifelse(x < 0, 1 - base^(-x),
ifelse(x == 0, 0, base^x - 1))
}
log_pm_trans <- function(base = exp(1)) {
trans <- function(x) log_pm(x, base)
inv <- function(x) log_pm_inv(x, base)
scales::trans_new(paste0("log_pm-", format(base)), trans, inv,
scales::log_breaks(base = base),
domain = c(-10, 10))
}
I want to plot my plot p
from before and transform the y axis using the following code:
p + scale_y_continuous(trans=log_pm_trans(10))
However this does not work as expected and leads to the following errors and warnings:
Error in if (max == min) return(base^min) :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In ifelse(x < 0, -log((-x) + 1, base), ifelse(x == 0, 0, log(x + :
NaNs produced
2: In ifelse(x == 0, 0, log(x + 1, base)) : NaNs produced
3: In self$trans$breaks(limits) : NaNs produced
The error seems similar to here: https://statisticsglobe.com/r-error-in-if-while-condition-missing-value-where-true-false-needed But I have no idea what to do.
Any help is highly appreciated.
Thanks a lot!
Upvotes: 0
Views: 2284
Reputation: 19097
Here I present a solution that should fix your problem. However, I do not think it's a good idea to include "both positive and negative" scale after taking log
, since if you log
a number that's smaller than 1, it'd be a negative number, which will kind of mixed with your original negative number, especially when you original data contains multiple values near 0.
When you first create the dataframe, include an extra column that specify the Sign
(i.e. positive or negative) of your y
column.
Then in the ggplot
, take log
in absolute value of y
, then multiply it with the Sign
column (either +1, 0 or -1).
UPDATE: Credit to @Waldi for suggesting to add colour to the Sign
column for clarity.
Note that if you data contains "0", you'll need to modify labels
in scale_color_discrete
to include it.
library(tidyverse)
df <- data.frame(x=1:10, y=rnorm(10)) %>% mutate(Sign = sign(y))
df
#> x y Sign
#> 1 1 0.1743404 1
#> 2 2 -1.6907048 -1
#> 3 3 1.2044741 1
#> 4 4 -2.3685458 -1
#> 5 5 -1.0309279 -1
#> 6 6 -1.3224189 -1
#> 7 7 -0.9016194 -1
#> 8 8 0.7006805 1
#> 9 9 -1.4128617 -1
#> 10 10 -0.1607972 -1
ggplot(data=df, aes(x=x, y=log(abs(y))*Sign, color = as.character(Sign))) +
geom_point() +
labs(y = "y") +
scale_color_discrete(name = "Legend", labels = c("Negative", "Positive"))
Created on 2022-03-12 by the reprex package (v2.0.1)
Upvotes: 1