Reputation: 11
In a project, I measured the iodine concentration of tumors (column=ROI_IC) at different off center positions (column=Offcenter) (table heights) in a CT scanner. I know the true concentration of each of the tumors (column=Real_IC; there are 4 different tumors with 4 different real_IC concentrations). Each tumor was measured at each off-center position 10 times (column=Measurement_repeat). I calculated an absolute error between the measured iodine concentration and the real iodine concentration (column=absError_IC)
This is just the head of the data:
Offcenter Measurement_repeat Real_IC ROI_IC absError_IC 1 0 1 0.0 0.4 0.4 2 0 2 0.0 0.3 0.3 3 0 3 0.0 0.3 0.3 4 0 4 0.0 0.0 0.0 5 0 5 0.0 0.0 0.0 6 0 6 0.0 -0.1 0.1 7 0 7 0.0 -0.2 0.2 8 0 8 0.0 -0.2 0.2 9 0 9 0.0 -0.1 0.1 10 0 10 0.0 0.0 0.0 11 0 1 0.4 0.4 0.0 12 0 2 0.4 0.3 0.1 13 0 3 0.4 0.2 0.2 14 0 4 0.4 0.0 0.4 15 0 5 0.4 0.0 0.4 16 0 6 0.4 -0.1 0.5 17 0 7 0.4 0.1 0.3 18 0 8 0.4 0.3 0.1 19 0 9 0.4 0.6 0.2 20 0 10 0.4 0.7 0.3
Now I would like to create a new column called corrError_IC.
In this column, the measured iodine concentration (ROI_IC) should be corrected based on the mean absolute error (mean of 10 measurements) that was found for that specific Real_IC concentration at Offcenter = 0
Because there are 4 tumor concentrations there are 4 mean values at Off-center =0 that I want to apply on the other off-center-values.
mean1=mean of the 10 absError-IC measurements of the `Real_IC=0`
mean2=mean of the 10 absError-IC measurements of the `Real_IC=0.4`
mean3=mean of the 10 absError-IC measurements of the `Real_IC=3`
mean4=mean of the 10 absError-IC measurements of the `Real_IC=5`
Basically, I want the average absolute error for a specific tumor at Offcenter = 0
(there are 4 different tumor types with four different Real_IC) and then I want correct all tumors at the other Offcenter positions by this absolute error values that were derived from the Offcenter = 0
data.
I tried ifelse
statements but I was not able to figure it out.
EDIT: Off-center has specific levels: c(-6,-4,-3,-2,-1,0,1,2,3,4,6)
Upvotes: 0
Views: 129
Reputation: 63
Here is how I would approach this problem.
absError_IC
grouped by Real_IC
.Code Example
## replicate sample data sets
ROI_IC = c(0.4, 0.3, 0.3, 0.0, 0.0, -0.1, -0.2, -0.2, -0.1, 0.0,
0.4, 0.3, 0.2, 0.0, 0.0, -0.1, 0.1, 0.3, 0.6, 0.7)
df = data.frame("Offcenter"=rep(0, 40),
"Measurement_repeat"=rep( c(1:10), 4),
"Real_IC"=rep( c(0,0.4,3,5), each=10),
"ROI_IC"=rep(ROI_IC, 2),
stringsAsFactors=F)
df$absError_IC = abs(df$Real_IC - df$ROI_IC)
## compute mean of "absError_IC" grouped by "Real_IC"
mean_values = aggregate(df[df$Offcenter==0, c("absError_IC")],
by=list("Real_IC"=df$Real_IC),
FUN=mean)
names(mean_values)[which(names(mean_values)=="x")] = "MAE"
## left join to append column
df = merge(df, mean_values, by.x="Real_IC", by.y="Real_IC", all.x=T, all.y=F, sort=F)
## notice that column order shifts based on "key"
df[c(1:5, 10:15), ]
I suggest using data.table
package which is particularly useful when there is need to manipulate large data.
library(data.table)
## dt = data.table(df) or dt = fread(<path>)
## dt[dt$Offcenter==0, c("absError_IC") := abs(dt$Real_IC - dt$ROI_IC)]
## compute grouped mean
mean_values = dt[, j=list("MAE"=mean(absError_IC)), by=list(Real_IC)]
## left join
dt = merge(dt, mean_values, by.x="Real_IC", by.y="Real_IC", all.x=T, all.y=F, sort=F)
Upvotes: 1
Reputation: 11
I found a way to compute what I want by creating an extra column taking the average absolute errors from the 4 Real_IC levels for Off-center = 0 and matching them whenever Real_IC has a certain level. In a second step, I subtract these from the ROI_ICs. However, how can I simplify that code to a more general form (at the moment I calculate the average absErrors based on their row location)? Sorry I am an absolute beginner ;(
Of note: My data.frame is called "ds_M"
#Define absolute errors for the 4 Real_IC levels as variables
average1<-mean(ds_M$absError_IC[1:10]) #for Real_IC=0
average2<-mean(ds_M$absError_IC[11:20]) #for Real_IC=0.4
average3<-mean(ds_M$absError_IC[21:30]) #for Real_IC=3
average4<-mean(ds_M$absError_IC[31:40]) #for Real_IC=5
# New column assigning the correction factor to each Real_IC level
ds_M$absCorr[ds_M$Real_IC==0]<-average1
ds_M$absCorr[ds_M$Real_IC==0.4]<-average2
ds_M$absCorr[ds_M$Real_IC==3]<-average3
ds_M$absCorr[ds_M$Real_IC==5]<-average4
# Calculate new column with corrected ROI_ICs
ds_M$corrError_IC<-ds_M$ROI_IC - ds_M$absCorr
Upvotes: 0
Reputation: 107747
Consider ave
for inline aggregation where its first argument is the numeric quantity field, next arguments is grouping fields, and very last argument requiring named parameter, FUN
, is the numeric function: ave(num_vector, ..., FUN=func)
.
df$corrError_IC <- with(df, ave(absError_IC, Real_IC, FUN=mean))
To handle NAs, extend the function argument for na.rm
argument:
df$corrError_IC <- with(df, ave(absError_IC, Real_IC, FUN=function(x) mean(x, na.rm=TRUE))
Upvotes: 0