`non-finite value supplied` in ggstatsplot

Question

I am working with ggstatsplot to get visual representations of my statistical analyses.

I have numerous datasets, all very similar in make-up. Some work just fine, while others don't. data1 is a working example, and data2 doesn't work.

 data1 <- structure(list(
     treatment = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 
     2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
     3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
     5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
     6L),
     .Label = c("negative_ctrl", "positive_ctrl", "treatmentA", "treatmentB", "treatmentC", "treatmentD"), class = "factor"),
     
     value = c(1.74501, 2.04001, 1.89501, 1.84001, 
     1.89501, 9.75001, 8.50001, 8.80001, 11.50001, 10.25001, 7.90001, 
     9.25001, 11.45001, 7.75001, 7.75001, 7.55001, 8.70001, 8.20001, 
     6.95001, 6.60001, 7.40001, 7.15001, 8.25001, 9.20001, 8.95001, 
     6.45001, 6.05001, 5.40001, 7.95001, 6.80001, 4.65001, 6.40001, 
     6.40001, 6.70001, 5.40001, 3.20001, 2.70001, 4.30001, 4.10001, 
     3.60001, 4.00001, 3.00001, 4.70001, 3.10001, 3.50001, 6.45001, 
     5.45001, 4.90001, 7.25001, 4.55001, 4.70001, 6.25001, 5.65001, 
     6.00001, 5.10001)),
     
     row.names = c(NA, -55L), class = c("tbl_df", "tbl", "data.frame"))

data2 <- structure(list(
     treatment = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 
     2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
     4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 
     5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L),
     .Label = c("negative_ctrl", "positive_ctrl", "treatmentA", "treatmentB", "treatmentC", "treatmentD"), class = "factor"), 
    
     value = c(1.00001, 1.00001, 1.00001, 1.00001, 1.00001, 6.77501, 
     5.68751, 5.99201, 8.24501, 7.01251, 4.79501, 5.99126, 8.26276, 
     5.35376, 5.38751, 4.60251, 5.38901, 4.85201, 4.44401, 5.20501, 
     6.20701, 5.77001, 4.05201, 3.65126, 3.02401, 4.68351, 3.90001, 
     2.56951, 3.70001, 3.61901, 3.96401, 2.93601, 1.53901, 1.40801, 
     2.05601, 2.08501, 1.89701, 1.79501, 1.50001, 2.09151, 1.53551, 
     1.57501, 3.88851, 3.09151, 2.75501, 4.40626, 2.42001, 2.60951, 
     3.83501, 3.37151, 3.70001, 2.92701)),
     
     row.names = c(NA, -52L), class = c("tbl_df", "tbl", "data.frame"))

I call the most basic analysis for both datasets:

library(Rmpfr)
library(ggstatsplot)

ggstatsplot::ggbetweenstats(
     data = data1, 
     x = treatment, 
     y = value,
     messages = FALSE )

ggstatsplot::ggbetweenstats(
     data = data2, 
     x = treatment, 
     y = value,
     messages = FALSE )

For data1 I get this:

for data2 I get:

> Error in stats::optim(par = 1.1 * rep(lambda, 2), fn = function(x) { : non-finite value supplied by optim

At first I thought the issue might be a few zeros that I passed on in the negative control, but I first upped them by a tiny amount and then by 1 to make sure the range of the values is not an issue. The only discrepancy I can see is that I only have 7 instead of 10 measurements for treatmentA (level 3) in data2 but 10 in data1 (had to remove a few NAs due to sample failure). However, in both cases the negative control (level 1) only has 5 values, and I don't think that in this type of analysis there is an issue with different sample sizes between the groups.

Jonny Phelps · Accepted Answer

It's a good idea to try basic plots out in these cases eg isolate the boxplots:

So comparing the two datasets:

boxplot(value ~ treatment, data=data1)
boxplot(value ~ treatment, data=data2)

data2 has a treatment with no variability ("negative_ctrl"), 0 SD. I'm guessing this function is doing some tests that require variation. You will need to read the documentation for the function to see if this is brought up but you can get views either by removing these treatments, or forcing a very small amount of variation eg

# run without negative_ctrl
ggstatsplot::ggbetweenstats(
  data = data2[data2$treatment != "negative_ctrl",], 
  x = treatment, 
  y = value,
  messages = FALSE )

# add some tiny fake variation to force it through (this is a hack)
data3 <- data2
data3[data3$treatment=="negative_ctrl",][1,][["value"]] <- 1.0001
ggstatsplot::ggbetweenstats(
  data = data3, 
  x = treatment, 
  y = value,
  messages = FALSE )

`non-finite value supplied` in ggstatsplot

Answers (1)

Related Questions