Reputation: 11
I am having two sets of data values are arranged in different bins I want to compare two data sets mean accross bins of dataset1 and dataset2 visualize in line plot or any method to visualize I am new to this kind of analysis any suggestion will be very helpful
dataset1 and dataset2 actual data bin size is different bin1-bin200 on both datasets and the number of data is varing(300-200) so metioned below sample dataset I wanted use bootstrap method take random data example 100 from both datasets and take mean accross all bins dataset1 and 2 why I am doing boostrap in both data bin size is similar but datas are varing may infer in taking mean and also presence of outlier extream low and high values accross the bins may alter the result so I wanted to use bootstrap method take random dataset take mean across all bins
any suggestions how can I do this in R I am newbie to R and I am in learning phase please help me
dataset1=structure(list(genenames = c("data1", "data2", "data3", "data4", "data5", "data6"),
bin1 = c(0,20,9,0,2,0),
bin2 = c(5,20,8,30,10,0),
bin3 = c(0,0,1,1,3,0),
bin4 =c(6, 20, 10, 5, 0, 1),
bin5 =c(10,15,30,10,9, 4)),
class = "data.frame", row.names = c(NA, -6L))
dataset2=structure(list(genenames = c("data10", "data11", "data12", "data13", "data14", "data15"),
bin1 = c(0,30,0,0,20,0),
bin2 = c(0,0,8,10,20,0),
bin3 = c(0,10,19,15,3,10),
bin4 =c(30, 0, 0, 25, 0, 20),
bin5 =c(0,5,0,20,30, 29)),
class = "data.frame", row.names = c(NA, -6L))
dataset1_mean=colMeans(dataset1[,-1])
dataset2_mean=colMeans(dataset2[,-1])
any statisticl method to remove this outlier or any problem to use bootstrap method please mention Thank you
Upvotes: 0
Views: 262
Reputation: 79184
Here is one way: After some data wrangling you could use boxplot and mark the mean with a red point:
library(dplyr)
library(ggplot2)
library(tidyr)
dataset1 <- dataset1 %>%
mutate(df = "df1")
dataset2 <- dataset2 %>%
mutate(df = "df2")
bind_rows(dataset1, dataset2) %>%
pivot_longer(
cols = starts_with("bin"),
names_to = "name",
values_to = "value"
) %>%
ggplot(aes(df, value))+
geom_boxplot() +
stat_summary(fun=mean,
geom="point",
shape=20,
size=4,
color="red",
position = position_dodge2 (width = 0.7, preserve = "single"))
Upvotes: 0