Reputation: 11

R - How to create a new column in a dataframe with calculations based on condition of another column

In a project, I measured the iodine concentration of tumors (column=ROI_IC) at different off center positions (column=Offcenter) (table heights) in a CT scanner. I know the true concentration of each of the tumors (column=Real_IC; there are 4 different tumors with 4 different real_IC concentrations). Each tumor was measured at each off-center position 10 times (column=Measurement_repeat). I calculated an absolute error between the measured iodine concentration and the real iodine concentration (column=absError_IC)

This is just the head of the data:

       Offcenter Measurement_repeat Real_IC ROI_IC absError_IC
1          0                  1     0.0    0.4         0.4
2          0                  2     0.0    0.3         0.3
3          0                  3     0.0    0.3         0.3
4          0                  4     0.0    0.0         0.0
5          0                  5     0.0    0.0         0.0
6          0                  6     0.0   -0.1         0.1
7          0                  7     0.0   -0.2         0.2
8          0                  8     0.0   -0.2         0.2
9          0                  9     0.0   -0.1         0.1
10         0                 10     0.0    0.0         0.0
11         0                  1     0.4    0.4         0.0
12         0                  2     0.4    0.3         0.1
13         0                  3     0.4    0.2         0.2
14         0                  4     0.4    0.0         0.4
15         0                  5     0.4    0.0         0.4
16         0                  6     0.4   -0.1         0.5
17         0                  7     0.4    0.1         0.3
18         0                  8     0.4    0.3         0.1
19         0                  9     0.4    0.6         0.2
20         0                 10     0.4    0.7         0.3

Now I would like to create a new column called corrError_IC.
In this column, the measured iodine concentration (ROI_IC) should be corrected based on the mean absolute error (mean of 10 measurements) that was found for that specific Real_IC concentration at Offcenter = 0

Because there are 4 tumor concentrations there are 4 mean values at Off-center =0 that I want to apply on the other off-center-values.

mean1=mean of the 10 absError-IC measurements of the `Real_IC=0`

mean2=mean of the 10 absError-IC measurements of the `Real_IC=0.4`

mean3=mean of the 10 absError-IC measurements of the `Real_IC=3`

mean4=mean of the 10 absError-IC measurements of the `Real_IC=5`

Basically, I want the average absolute error for a specific tumor at Offcenter = 0 (there are 4 different tumor types with four different Real_IC) and then I want correct all tumors at the other Offcenter positions by this absolute error values that were derived from the Offcenter = 0 data.

I tried ifelse statements but I was not able to figure it out.

EDIT: Off-center has specific levels: c(-6,-4,-3,-2,-1,0,1,2,3,4,6)

Upvotes: 0

Answers (3)

camel_case

Reputation: 63

Here is how I would approach this problem.

compute mean of absError_IC grouped by Real_IC.
left join original data.frame with grouped mean

Code Example

## replicate sample data sets
ROI_IC = c(0.4, 0.3, 0.3, 0.0, 0.0, -0.1, -0.2, -0.2, -0.1, 0.0, 
           0.4, 0.3, 0.2, 0.0, 0.0, -0.1, 0.1, 0.3, 0.6, 0.7)
df = data.frame("Offcenter"=rep(0, 40),
                "Measurement_repeat"=rep( c(1:10), 4),
                "Real_IC"=rep( c(0,0.4,3,5), each=10), 
                "ROI_IC"=rep(ROI_IC, 2), 
                stringsAsFactors=F)
df$absError_IC = abs(df$Real_IC - df$ROI_IC)

## compute mean of "absError_IC" grouped by "Real_IC"
mean_values = aggregate(df[df$Offcenter==0, c("absError_IC")], 
                        by=list("Real_IC"=df$Real_IC),
                        FUN=mean)
names(mean_values)[which(names(mean_values)=="x")] = "MAE"

## left join to append column
df = merge(df, mean_values, by.x="Real_IC", by.y="Real_IC", all.x=T, all.y=F, sort=F)
## notice that column order shifts based on "key"
df[c(1:5, 10:15), ]

I suggest using data.table package which is particularly useful when there is need to manipulate large data.

library(data.table)
## dt = data.table(df) or dt = fread(<path>)
## dt[dt$Offcenter==0, c("absError_IC") := abs(dt$Real_IC - dt$ROI_IC)]

## compute grouped mean
mean_values = dt[, j=list("MAE"=mean(absError_IC)), by=list(Real_IC)]

## left join
dt = merge(dt, mean_values, by.x="Real_IC", by.y="Real_IC", all.x=T, all.y=F, sort=F)

Upvotes: 1

user7937045

Reputation: 11

I found a way to compute what I want by creating an extra column taking the average absolute errors from the 4 Real_IC levels for Off-center = 0 and matching them whenever Real_IC has a certain level. In a second step, I subtract these from the ROI_ICs. However, how can I simplify that code to a more general form (at the moment I calculate the average absErrors based on their row location)? Sorry I am an absolute beginner ;(

Of note: My data.frame is called "ds_M"

#Define absolute errors for the 4 Real_IC levels as variables

average1<-mean(ds_M$absError_IC[1:10]) #for Real_IC=0
average2<-mean(ds_M$absError_IC[11:20]) #for Real_IC=0.4
average3<-mean(ds_M$absError_IC[21:30]) #for Real_IC=3
average4<-mean(ds_M$absError_IC[31:40]) #for Real_IC=5

# New column assigning the correction factor to each Real_IC level
ds_M$absCorr[ds_M$Real_IC==0]<-average1
ds_M$absCorr[ds_M$Real_IC==0.4]<-average2
ds_M$absCorr[ds_M$Real_IC==3]<-average3
ds_M$absCorr[ds_M$Real_IC==5]<-average4

# Calculate new column with corrected ROI_ICs
ds_M$corrError_IC<-ds_M$ROI_IC - ds_M$absCorr

Upvotes: 0

Parfait

Reputation: 107747

Consider ave for inline aggregation where its first argument is the numeric quantity field, next arguments is grouping fields, and very last argument requiring named parameter, FUN, is the numeric function: ave(num_vector, ..., FUN=func).

df$corrError_IC <- with(df, ave(absError_IC, Real_IC, FUN=mean))

To handle NAs, extend the function argument for na.rm argument:

df$corrError_IC <- with(df, ave(absError_IC, Real_IC, FUN=function(x) mean(x, na.rm=TRUE))

Upvotes: 0

R - How to create a new column in a dataframe with calculations based on condition of another column

Answers (3)

Related Questions