Applying functions to dataframe or multiple lists

Question

Edit as Per the comments: The OP would like to calculate:

(100 *  (1 - 10 ^ - (Do - Do[Do==0] )) ⎞ (1 - 10 ^ - (Do[Do==100] - Do[Do==0]) - Do

For each combination of Cl, In, Sa in the data.frame
-RS

I am trying to apply a function, called dG, to a dataframe. Since the function's arguments length differ recycling produced unpredictable results.

To rectify this issue I separated the dataframe into lists and tried to apply the dG function (below) to each list after identifing each list with a function called 'ids'.

Please feel free to suggest a different solution. FYI, my specific requests start with bullet points

Please let me start by providing synthetic data that shows the issues:

Do <- rep(c(0,2,4,6,8,10,15,20,30,40,45,50,55,60,65,70,80,85,90,92,94,96,98,100), each=16,times=16)
Cl <- rep(c("K", "Y","M","C"), each= 384, times=4)
In <- rep(c("A", "S"), each=3072)
Sa <- rep(c(1,2), each=1536)
Data <- rnorm(6144)
DataFrame <- cbind.data.frame(Do,Cl,In,Sa,Data); head(DataFrame)
rm(Do,Cl,In,Sa,Data)
attach(DataFrame)

DFSplit <- split(DataFrame[ , "Data"], list(Do, Cl, In, Sa))

The function 'ids' is a helper function that identifies the lists names

ids <- function(Do, Cl, In, Sa){
    grep( paste( "^" , Do, "\.",
                Cl, "\.",
                In,
                "\.", Sa,sep=""),
         names(DFSplit), value = TRUE)}

mapply(ids, Do, Cl, In, Sa, SIMPLIFY = FALSE)

The above mapply produces 6144 lists. If you look at the mapply output you will notice that there is 384 unique list names but each is repeated 16 times 384*16=6144.

How can I change the 'ids' function so that mapply doesn't repeat the same name 16 times.

As an ugly and highly costly solution I used unique; I need a better fundamental solution.

unique(mapply(ids, Do, Cl, In, Sa, SIMPLIFY = FALSE))

The dG function is the one that I want to operates on each of the 'DFSplit' lists. It has the same issue as the previous ids function. It uses the ids function as an input.

dG <- function(Do,Cl, In, Sa){
    dg <- 100*
                (1-10^-( DFSplit[[ids(Do,  Cl, In, Sa)]] - DFSplit[[ids(0, Cl, In, Sa)]])) /
                (1-10^-( DFSplit[[ids(100, Cl, In, Sa)]] - DFSplit[[ids(0, Cl, In, Sa)]])) - Do
    dg}

I tried to use dG as follows and it is not what I want.

dG(Do,Cl, In, Sa)

It only evaluated the LAST part of the dG function (- Do) plus this warning

In grep(paste("^", unique(Do), "\.", unique(Cl), "\.", unique(In), : argument 'pattern' has length > 1 and only the first element will be used

Can you suggest a modification to the dG function

Then I tried mapply

mapply(dG, Do, Cl, In, Sa, SIMPLIFY = FALSE)

mapply correctly evaluated the function with my data. mapply produces 6144 lists. You will notice that the mapply output is basically 384 unique lists, each repeated 16 times 384*16=6144.

How can I modify the dG function to get rid of the useless and time consuming repetition?

My thought would be:

eliminate the repetition in my first function 'ids', which I do not know how to do .
change the arguments of the second function so the arguments' lengths would be 384. Maybe use the names of the lists as an input argument. which I do not know how.
Change the formula dG and not use (Do, Cl, In, Sa) arguments since each one has a length of 6144

Ricardo Saporta · Accepted Answer

UPDATE:

The comment you made to @Roland, was all you had to put in each of your previous related questions, this once included.

The entirety of your process can be handled in one line of code:

library(data.table)
myDT <- data.table(DataFrame)

myDT[ , "TVI" :=  100 * (1 - 10^-(Data - Data[Do==0])) / (1 - 10^-(Data[Do==100] - Data[Do==0])) 
      , by=list(Cl, In, Sa)]

# this is your Tonval Value Increase
myDT$TVI

original answer:

It's stil awfully unclear what you are trying to accomplish. However, here are two concepts that should be able to save you a world of headaches.

First, you do not need your `ids` function. You can get more mileage out of `expand.grid`:

myIDs <- expand.grid(unique(Do), unique(Cl), unique(In), unique(Sa))

# You can then use something like 
apply(myIDs, 1, paste, sep=".")
# to get the same results.  Or whatever other function suits

However, even that is not neccessary.

Here is the equivalent of your `dG` function using `data.table`.

Notice there is no need for any of the splitting or ids or anything like that.
Everything is hanlded by the by function in data.table.

library(data.table)
myDT <- data.table(DataFrame)

myDT

dG_DT <- 
    100 * 
    1 - 10^(   myDT[ ,     Data, by=list(Do, Cl, In, Sa)][, Data] 
             - myDT[Do==0, Data, by=list(Do, Cl, In, Sa)][, Data]
            ) / 

    1 - 10^(   myDT[Do==100, Data, by=list(Do, Cl, In, Sa)][, Data]
             - myDT[Do==0,   Data, by=list(Do, Cl, In, Sa)][, Data]
            ) - 
    myDT[, Do]

dG_DT

Applying functions to dataframe or multiple lists

Answers (1)

UPDATE:

First, you do not need your `ids` function. You can get more mileage out of `expand.grid`:

Here is the equivalent of your `dG` function using `data.table`.

Related Questions

Applying functions to dataframe or multiple lists

Answers (1)

UPDATE:

First, you do not need your ids function. You can get more mileage out of expand.grid:

Here is the equivalent of your dG function using data.table.

Related Questions

First, you do not need your `ids` function. You can get more mileage out of `expand.grid`:

Here is the equivalent of your `dG` function using `data.table`.