Armelle Quinn
Armelle Quinn

Reputation: 67

How to calculate the average of experimental values of replicates of the same sample, without knowing the number of replicates ahead?

I have a csv file with a data set of experimental values of many samples, and sometimes replicates of the same sample. For the replicates I only take into account the mean value of the replicates belonging to the same sample. The problem is, the number of replicates varies, it can be 2, 3, 4 etc...

My code isn't right, because it should be only working if replicates number is 2 (since I am using a loop to compare one sampleID to the previous sampleID in the loop). Plus, my code doesn't work, it adds the same average value to all my samples, which is not right. I think there is a problem at the start of the loop too. Because when x=1, x-1=0 which doesn't correspond to any value, so that may cause the code to not work? I am a beginner in R, I never had any courses or training I am training to learn it by myself, so thank you in advance for your help.

My dataset:

enter image description here

Expected output:

enter image description here

PS: in this example the replicates number is 2. However, it can be different depending on samples, sometimes its 2, sometimes 3, 4 etc...

for (x in length(dat$Sample)){
  if (dat$Sample[x]==dat$Sample[x-1]){
    dat$Average.OD[x-1] <- mean(dat$OD[x], dat$OD[x-1])
    dat$Average.OD[x] <- NA
  }
}

Upvotes: 0

Views: 875

Answers (2)

ealbsho93
ealbsho93

Reputation: 141

Let me show you the possible solution by data.table.

#Data
data <- data.frame('Sample'=c('Blank','Blank','STD1','STD1'), 
                             'OD'=c(0.07,0.08,0.09,0.10))

#Code
#Converting our data to data.table.
setDT(data)

#Finding the average of OD by Sample Column. Here Sample Column is the key.If you want it by both Sample and Replicates, pass both of them in by and so on.
data[, AverageOD := mean(OD), by = c("Sample")]

#Turning all the duplicate AverageOD values to NA.
data[duplicated(data, by = c("Sample")), AverageOD := NA] 

#Turning column name of AverageOD to  Average OD
names(data)[which(names(data) == "AverageOD")] = 'Average OD'

Let me know if you have any questions.

Upvotes: 1

G5W
G5W

Reputation: 37641

You can do this without any looping using aggregate and merge. Since you do not provide any data, I illustrate with a simple example.

## Example data
set.seed(123)
Sample = round(runif(10), 1)
OD = sample(4, 10, replace=T)
dat = data.frame(OD, Sample)

Means = aggregate(dat$Sample, list(dat$OD), mean, na.rm=T)
names(Means) = c("OD", "mean")
Means
  OD      mean
1  1 0.9000000
2  2 0.7000000
3  3 0.3666667
4  4 0.4000000

merge(dat, Means, "OD")
   OD Sample      mean
1   1    0.9 0.9000000
2   1    0.9 0.9000000
3   2    0.8 0.7000000
4   2    0.9 0.7000000
5   2    0.4 0.7000000
6   3    0.0 0.3666667
7   3    0.6 0.3666667
8   3    0.5 0.3666667
9   4    0.3 0.4000000
10  4    0.5 0.4000000

Upvotes: 1

Related Questions