Reputation: 37
I am using plotly in R to generate a volcano plot. When I call the plot, it looks fine, but I also get an error:
Warning message: Ignoring 5 observations
How do I find out what these 5 observations are? Or why they are being ignored? I need to check if there is a problem with these 5 data points. I have visually inspected my data, and can't see any problem, but it is a huge dataset so I could be missing something.
Could anyone advise with some debugging steps please?
EDIT: I'm not sure if it will help, as I suspect the problem is with my data, not code, but here is the code I'm using, with simplified data.
library(plotly)
name <- c("Name1", "Name2", "Name3")
log2FoldChange <- c(-2.7419374, 2.9655255, -1.7455225)
padj <- c(2.25e-27, 3.01e-24, 2.56e-25)
df <- data.frame(name, log2FoldChange, padj)
my_plot <- plot_ly(data = df,
x = df$log2FoldChange,
y = -log10(df$padj))
Upvotes: 0
Views: 1404
Reputation: 1865
In regards to your question, I see you are using log10
function to calculate y-axis points, please note that log is not defined for negative numbers (and 0). Those values will produce NaN
and will be ignored.
In regards to broader question, how to see what data is being plotted on, with both ggplot
and plotly
you can extract the data, but with plotly
I found it hard to find which data isn't plotted.
library(palmerpenguins)
#> Warning: package 'palmerpenguins' was built under R version 4.0.5
invisible(library(tidyverse))
#> Warning: package 'dplyr' was built under R version 4.0.3
invisible(library(plotly))
randomRows = sample(1:nrow(penguins), 10) #to replace any 10 rows with NA
penguins[randomRows, "body_mass_g"] = NA
penguins %>%
ggplot(aes(bill_length_mm, body_mass_g)) +
geom_point() -> plot_ggplot
plot_ggplot
#> Warning: Removed 11 rows containing missing values (geom_point).
As you can see, it throws the warning that rows has been ignored.
To get the plot data:
head(ggplot_build(plot_ggplot)$data[[1]])
#> x y PANEL group shape colour size fill alpha stroke
#> 1 39.1 3750 1 -1 19 black 1.5 NA NA 0.5
#> 2 39.5 3800 1 -1 19 black 1.5 NA NA 0.5
#> 3 40.3 3250 1 -1 19 black 1.5 NA NA 0.5
#> 4 NA NA 1 -1 19 black 1.5 NA NA 0.5
#> 5 36.7 3450 1 -1 19 black 1.5 NA NA 0.5
#> 6 39.3 3650 1 -1 19 black 1.5 NA NA 0.5
The x
and y
columns represents your x and y axis.
In case of plotly
there is a similar way to extract data, but it just shows, only those points which are plotted. I couldn't figure out a way to extract the values which are being ignored.
pp = plot_ly(penguins, x=~bill_depth_mm, y=~body_mass_g, type='scatter')
plotly_build(pp) -> plotly_data
#> No scatter mode specifed:
#> Setting the mode to markers
#> Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
#> Warning: Ignoring 11 observations
#> Warning: `arrange_()` was deprecated in dplyr 0.7.0.
#> Please use `arrange()` instead.
#> See vignette('programming') for more help
names(plotly_data$x$data[[1]])
#> [1] "x" "y" "type" "mode" "marker" "error_y" "error_x"
#> [8] "line" "xaxis" "yaxis" "frame"
#this gives your x-axis data
plotly_data$x$data[[1]]$x[1:5]
#> [1] 18.7 17.4 18.0 19.3 20.6
#this gives your y-axis data
plotly_data$x$data[[1]]$y[1:5]
#> [1] 3750 3800 3250 3450 3650
Created on 2021-07-02 by the reprex package (v0.3.0)
Upvotes: 1