Easiest way to apply series of calculations to similar data frames in R

Question

The following is an example of how I want to treat my data sets. It might be a bit different to understand how my data frame is structured, but I hope it makes sense:

First density must be calculated for columns A, B, and C using raw data from columns ADry, AEthanol, BDry ...... (Since these were earlier defined as vectors too, i used the vectors instead data frame columns as it was shorter - ADry_1_0 instead of Sample_1_0$ADry_1_0)

Sample_1_0$ADensi_1_0=(ADry_1_0/(ADry_1_0-AEthanol_1_0))*(peth-pair)+pair 
Sample_1_0$BDensi_1_0=(BDry_1_0/(BDry_1_0-BEthanol_1_0))*(peth-pair)+pair
Sample_1_0$CDensi_1_0=(CDry_1_0/(CDry_1_0-CEthanol_1_0))*(peth-pair)+pair

This yields 10 densities for both A, B, and C. What's interesting is the mean density

Mean_1_0=apply(Sample_1_0[7:9],2,mean)

Next standard deviations are found. We are mainly interested in standard deviations for our raw data columns (ADry and AEthanol), as error propagation calculations are afterwards carried out to find out how the deviations sum up when calculating the densities

StdAfv_1_0=apply(Sample_1_0,2,sd)

Error propagation (same for B and C)

ASd_1_0=(sqrt((sd(Sample_1_0$ADry_1_0)/mean(Sample_1_0$ADry_1_0))^2+(sqrt((sd(Sample_1_0$ADry_1_0)^2+sd(Sample_1_0$AEthanol_1_0)^2))/(mean(Sample_1_0$ADry_1_0)-mean(Sample_1_0$AEthanol_1_0)))^2))*mean(Sample_1_0$ADensi_1_0)

In the end we semi manually gathered the end informations (mean density and deviation hereof) in a plot-able dataframe. Some of the codes might be a tad long and maybe we could have achieved equal results using shorter codes, but bear with us, we are rookies.

So now to the real actual problem

This was for A_1_0, B_1_0, and C_1_0. We would like to apply the same series of commands to 15 other data frames. The dimensions are the same, and they will be named A_1_1, A_1_2, A_2_0 and so on.

Is it possible to use some kind of loop function or make a loadable script containing x and y placeholders, where we can easily insert A_1_1 for instance??

Thanks in advance, i tried to keep the amount of confusion at a minimum, although it's tough!

Data list

tbradley · Accepted Answer

If instead of individual vectors you combine the raw data into data frames (or even better data.tables) and then subsequently store all the data frames for all runs into a list as @Gregor suggested, you can use this function below and the lapply function.

my_func <- function(dataset, peth, pair){
  require(data.table)
  names <- names(dataset)
  setDT(dataset)[, `:=` (ADens = (get(names[1])/(get(names[1])-get(names[4])))*(peth-pair)+pair,
                         BDens = (get(names[2])/(get(names[2])-get(names[5])))*(peth-pair)+pair,
                         CDens = (get(names[3])/(get(names[3])-get(names[6])))*(peth-pair)+pair)
                 ][,  .(ADens_mean = mean(ADens),
                           ADens_sd = sd(ADens),
                           AErr =     (sqrt((sd(get(names[1]))/mean(get(names[1])))^2) + 
                                     (sqrt((sd(get(names[1]))^2 + sd(get(names[4]))^2))/
                                        (mean(get(names[1])) - mean(get(names[4]))))^2)* mean(ADens),
                           BDens_mean = mean(BDens),
                           BDens_sd = sd(BDens),
                           BErr = (sqrt((sd(get(names[2]))/mean(get(names[2])))^2) + 
                                     (sqrt((sd(get(names[2]))^2 + sd(get(names[5]))^2))/
                                        (mean(get(names[2])) - mean(get(names[5]))))^2)* mean(BDens),
                           CDens_mean = mean(CDens),
                           CDens_sd = sd(CDens),
                           CErr = (sqrt((sd(get(names[3]))/mean(get(names[3])))^2) + 
                                     (sqrt((sd(get(names[3]))^2 + sd(get(names[6]))^2))/
                                        (mean(get(names[3])) - mean(get(names[6]))))^2)* mean(CDens))
                   ]
}

rbindlist(lapply(list_datasets, my_func, peth = 2, pair = 1))

Now, this assumes that you put your raw vectors into data frames with the columns in the order in which they appeared in your example (and that they are the only columns in the data set). If this is not the case, you may just have to edit the indices in the names[x] calls. If you wanted to have a little more flexibility, you could also define a list of list with the column names for each data set in your individual raw data sets, add that as an argument to my_func and then replace all the instances of names[x] with get(list_column_names[x])

This function should output a data.table with the results for each set of data sets (1-16) in individual rows with 6 columns (ADens_mean, ADens_sd, ...)

NOTE since there was no actual data to work with, I can't say for sure that this function does exactly what you want, but I think it will be close. This will also require you to download the data.table package.

Easiest way to apply series of calculations to similar data frames in R

Answers (1)

Related Questions