BISEP
BISEP

Reputation: 11

R mean compare two dataset with the bootstrap method

I am having two sets of data values are arranged in different bins I want to compare two data sets mean accross bins of dataset1 and dataset2 visualize in line plot or any method to visualize I am new to this kind of analysis any suggestion will be very helpful

dataset1 and dataset2 actual data bin size is different bin1-bin200 on both datasets and the number of data is varing(300-200) so metioned below sample dataset I wanted use bootstrap method take random data example 100 from both datasets and take mean accross all bins dataset1 and 2 why I am doing boostrap in both data bin size is similar but datas are varing may infer in taking mean and also presence of outlier extream low and high values accross the bins may alter the result so I wanted to use bootstrap method take random dataset take mean across all bins

any suggestions how can I do this in R I am newbie to R and I am in learning phase please help me

 dataset1=structure(list(genenames = c("data1", "data2", "data3", "data4", "data5", "data6"), 
      bin1 = c(0,20,9,0,2,0), 
      bin2 = c(5,20,8,30,10,0), 
      bin3 = c(0,0,1,1,3,0),
      bin4 =c(6, 20, 10, 5, 0, 1),
      bin5 =c(10,15,30,10,9, 4)), 
      class = "data.frame", row.names = c(NA, -6L))

dataset2=structure(list(genenames = c("data10", "data11", "data12", "data13", "data14", "data15"), 
      bin1 = c(0,30,0,0,20,0), 
      bin2 = c(0,0,8,10,20,0), 
      bin3 = c(0,10,19,15,3,10),
      bin4 =c(30, 0, 0, 25, 0, 20),
      bin5 =c(0,5,0,20,30, 29)), 
      class = "data.frame", row.names = c(NA, -6L))

dataset1_mean=colMeans(dataset1[,-1])
dataset2_mean=colMeans(dataset2[,-1])

any statisticl method to remove this outlier or any problem to use bootstrap method please mention Thank you

Upvotes: 0

Views: 262

Answers (1)

TarJae
TarJae

Reputation: 79184

Here is one way: After some data wrangling you could use boxplot and mark the mean with a red point:

library(dplyr)
library(ggplot2)
library(tidyr)

dataset1 <- dataset1 %>% 
    mutate(df = "df1")

dataset2 <- dataset2 %>% 
    mutate(df = "df2")

bind_rows(dataset1, dataset2) %>% 
    pivot_longer(
        cols = starts_with("bin"),
        names_to = "name",
        values_to = "value"
    ) %>% 
    ggplot(aes(df, value))+
    geom_boxplot() +
    stat_summary(fun=mean, 
                 geom="point",
                 shape=20, 
                 size=4, 
                 color="red", 
                 position = position_dodge2 (width = 0.7, preserve = "single"))

enter image description here

Upvotes: 0

Related Questions