sangui
sangui

Reputation: 21

Error in t.test.default(x = c(7.64221451305547, 7.64221451305547, 7.64221451305547, : data are essentially constant

I have problem with a t.test. I would like to study the differential expression of every of my protein in multiple case. To do this I wanted to do a t.test.

I build a dataset (just the head) :

 Protein     Norm Cultivar  Strain             CxS rep Treatment
1 TraesCS1A01G002200.1 8.955735  Cadenza MDC_FG1 Cadenza_MDC_FG1   1      Fusa
2 TraesCS1A01G005600.3 7.241294  Cadenza MDC_FG1 Cadenza_MDC_FG1   1      Fusa
3 TraesCS1A01G007400.1 7.009850  Cadenza MDC_FG1 Cadenza_MDC_FG1   1      Fusa
4 TraesCS1A01G007500.1 9.291952  Cadenza MDC_FG1 Cadenza_MDC_FG1   1      Fusa
5 TraesCS1A01G007700.1 7.988990  Cadenza MDC_FG1 Cadenza_MDC_FG1   1      Fusa
6 TraesCS1A01G009000.3 7.705927  Cadenza MDC_FG1 Cadenza_MDC_FG1   1      Fusa

The link of the full dataset is : https://github.com/sangui1216/protein-analysis It is the only csv.

I wanted to do a t.test on this data set, I did :

test%>%group_by(Protein) %>% do(tidy(t.test(Norm ~ Strain,data = .))) ->t_test_result

And at 48% I have this error message.

Error in t.test.default(x = c(7.64221451305547, 7.64221451305547, 7.64221451305547,  : 
  data are essentially constant

Can you help me ? My data set doesn't look like constant to me (but I have to many data to look one by one). Can I force the t.test to just ignore this part ?

Thank you very much,

Sangui

Thanks to the answers ! It works.

Upvotes: 2

Views: 122

Answers (2)

Sirius
Sirius

Reputation: 5429

It is indeed constant, have a look here:

library(data.table)
dat <- fread("https://github.com/sangui1216/protein-analysis/raw/main/test1.csv")
dat[ dat[, .I[var(Norm) == 0], by = Protein]$V1]
> dat[ dat[, .I[var(Norm) == 0], by = Protein]$V1]
        V1              Protein     Norm Cultivar  Strain              CxS rep Treatment
 1:   2499 TraesCS4A01G239100.2 7.642215  Cadenza MDC_FG1  Cadenza_MDC_FG1   1      Fusa
 2:   7657 TraesCS4A01G239100.2 7.642215  Cadenza MDC_FG1  Cadenza_MDC_FG1   2      Fusa
 3:  12815 TraesCS4A01G239100.2 7.642215  Cadenza MDC_FG1  Cadenza_MDC_FG1   3      Fusa
 4:  17973 TraesCS4A01G239100.2 7.642215  Cadenza control Cadenza_MDC_FU01   1      Fusa
 5:  23131 TraesCS4A01G239100.2 7.642215  Cadenza control Cadenza_MDC_FU01   2      Fusa
 6:  28289 TraesCS4A01G239100.2 7.642215  Cadenza control Cadenza_MDC_FU01   3      Fusa
 7:  33447 TraesCS4A01G239100.2 7.642215  Cadenza control Cadenza_MDC_FG13   1      Fusa
 8:  38605 TraesCS4A01G239100.2 7.642215  Cadenza control Cadenza_MDC_FG13   2      Fusa
 9:  43763 TraesCS4A01G239100.2 7.642215  Cadenza control Cadenza_MDC_FG13   3      Fusa
10:  48921 TraesCS4A01G239100.2 7.642215 aRecital MDC_FG1  Recital_MDC_FG1   1      Fusa
11:  54079 TraesCS4A01G239100.2 7.642215 aRecital MDC_FG1  Recital_MDC_FG1   2      Fusa
12:  59237 TraesCS4A01G239100.2 7.642215 aRecital MDC_FG1  Recital_MDC_FG1   3      Fusa
13:  64395 TraesCS4A01G239100.2 7.642215 aRecital control Recital_MDC_FU01   1      Fusa
14:  69553 TraesCS4A01G239100.2 7.642215 aRecital control Recital_MDC_FU01   2      Fusa
15:  74711 TraesCS4A01G239100.2 7.642215 aRecital control Recital_MDC_FU01   3      Fusa
16:  79869 TraesCS4A01G239100.2 7.642215 aRecital control Recital_MDC_FG13   1      Fusa
17:  85027 TraesCS4A01G239100.2 7.642215 aRecital control Recital_MDC_FG13   2      Fusa
18:  90185 TraesCS4A01G239100.2 7.642215 aRecital control Recital_MDC_FG13   3      Fusa
19:  95343 TraesCS4A01G239100.2 7.642215    Renan MDC_FG1    Renan_MDC_FG1   1      Fusa
20: 100501 TraesCS4A01G239100.2 7.642215    Renan MDC_FG1    Renan_MDC_FG1   2      Fusa
21: 105659 TraesCS4A01G239100.2 7.642215    Renan MDC_FG1    Renan_MDC_FG1   3      Fusa
22: 110817 TraesCS4A01G239100.2 7.642215    Renan control   Renan_MDC_FU01   1      Fusa
23: 115975 TraesCS4A01G239100.2 7.642215    Renan control   Renan_MDC_FU01   2      Fusa
24: 121133 TraesCS4A01G239100.2 7.642215    Renan control   Renan_MDC_FU01   3      Fusa
25: 126291 TraesCS4A01G239100.2 7.642215    Renan control   Renan_MDC_FG13   1      Fusa
26: 131449 TraesCS4A01G239100.2 7.642215    Renan control   Renan_MDC_FG13   2      Fusa
27: 136607 TraesCS4A01G239100.2 7.642215    Renan control   Renan_MDC_FG13   3      Fusa

Just filter away that protein, and run again.

Using data.table notation you can easily chain upon this to run the t.tests too, should take only a few seconds:


library(data.table)
library(broom)
dat <- fread("https://github.com/sangui1216/protein-analysis/raw/main/test1.csv")

system.time( t.test.results <- dat[ dat[, .I[var(Norm) > 0], by = Protein]$V1 ][
    , tidy( t.test( Norm ~ Strain ) ), by = Protein
] )

t.test.results[ order(p.value) ]


   user  system elapsed 
  2.922   0.000   2.833 

                   Protein      estimate estimate1 estimate2    statistic      p.value parameter   conf.low   conf.high                  method alternative
   1: TraesCS2A01G315600.1 -3.602881e-01  8.180318  8.540606 -8.856018988 2.590483e-08  19.75920 -0.4452173 -0.27535878 Welch Two Sample t-test   two.sided
   2: TraesCS1B01G107600.1 -2.154759e-01  7.944446  8.159922 -7.939083661 2.919241e-08  24.72827 -0.2714053 -0.15954648 Welch Two Sample t-test   two.sided
   3: TraesCS2B01G333900.1 -4.974910e-01  7.755210  8.252701 -7.648981619 1.725526e-07  20.91120 -0.6327845 -0.36219763 Welch Two Sample t-test   two.sided
   4: TraesCS6D01G302300.2 -3.101279e-01  8.955777  9.265905 -7.319162710 3.734485e-07  20.60395 -0.3983486 -0.22190713 Welch Two Sample t-test   two.sided
   5: TraesCS1B01G090600.1 -2.073122e-01  9.016683  9.223996 -7.629576322 8.048855e-07  16.60394 -0.2647448 -0.14987957 Welch Two Sample t-test   two.sided
  ---                                                                                                                                                      
5153: TraesCS1A01G250300.1  1.772189e-04  8.506596  8.506419  0.003487947 9.972537e-01  18.83739 -0.1062293  0.10658371 Welch Two Sample t-test   two.sided
5154: TraesCS4A01G042600.1 -7.410061e-04  7.107547  7.108288 -0.003026521 9.976295e-01  13.47675 -0.5277846  0.52630255 Welch Two Sample t-test   two.sided
5155: TraesCS3A01G275700.2 -1.007336e-04  8.006658  8.006758 -0.002819725 9.977826e-01  17.24193 -0.0753926  0.07519114 Welch Two Sample t-test   two.sided
5156: TraesCS4B01G293000.1  2.161934e-04  7.310060  7.309844  0.001284415 9.989894e-01  17.90796 -0.3535424  0.35397481 Welch Two Sample t-test   two.sided
5157: TraesCS1B01G059800.2 -7.504112e-05  7.212239  7.212314 -0.001240131 9.990325e-01  11.09887 -0.1331134  0.13296331 Welch Two Sample t-test   two.sided


Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76621

Here is a way of running the code until the end with tryCatch.

library(tidyverse)
library(generics)

test %>%
  group_by(Protein) %>% 
  do(tidy(
    tryCatch(
      t.test(Norm ~ Strain, data = .),
      error = function(e) NA
    )
  )) -> t_test_result

head(t_test_result)
## A tibble: 6 x 12
## Groups:   Protein [6]
#  Protein        estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method        alternative x    
#  <chr>             <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl> <chr>         <chr>       <lgl>
#1 TraesCS1A01G0…  -0.0285      8.70      8.73    -0.302  0.767      14.8  -0.230      0.173  Welch Two Sa… two.sided   NA   
#2 TraesCS1A01G0…   0.0429      7.04      7.00     0.307  0.765       9.78 -0.270      0.356  Welch Two Sa… two.sided   NA   
#3 TraesCS1A01G0…  -0.188       5.84      6.02    -0.444  0.663      17.1  -1.08       0.705  Welch Two Sa… two.sided   NA   
#4 TraesCS1A01G0…   0.0476      9.33      9.29     1.96   0.0616     24.9  -0.00250    0.0978 Welch Two Sa… two.sided   NA   
#5 TraesCS1A01G0…   0.209       8.09      7.88     1.62   0.137      10.2  -0.0787     0.498  Welch Two Sa… two.sided   NA   
#6 TraesCS1A01G0…   0.0673      8.05      7.99     1.00   0.334      12.3  -0.0784     0.213  Welch Two Sa… two.sided   NA   

To know what is the Protein with all values equal, run

which(tapply(test$Norm, test$Protein, var) == 0)
#TraesCS4A01G239100.2 
#                2499 

Upvotes: 0

Related Questions