Ragy Isaac
Ragy Isaac

Reputation: 1458

Applying functions to dataframe or multiple lists

Edit as Per the comments: The OP would like to calculate:

(100 *  (1 - 10 ^ - (Do - Do[Do==0] )) ⎞ (1 - 10 ^ - (Do[Do==100] - Do[Do==0]) - Do

For each combination of Cl, In, Sa in the data.frame
-RS


I am trying to apply a function, called dG, to a dataframe. Since the function's arguments length differ recycling produced unpredictable results.

To rectify this issue I separated the dataframe into lists and tried to apply the dG function (below) to each list after identifing each list with a function called 'ids'.

Please let me start by providing synthetic data that shows the issues:

Do <- rep(c(0,2,4,6,8,10,15,20,30,40,45,50,55,60,65,70,80,85,90,92,94,96,98,100), each=16,times=16)
Cl <- rep(c("K", "Y","M","C"), each= 384, times=4)
In <- rep(c("A", "S"), each=3072)
Sa <- rep(c(1,2), each=1536)
Data <- rnorm(6144)
DataFrame <- cbind.data.frame(Do,Cl,In,Sa,Data); head(DataFrame)
rm(Do,Cl,In,Sa,Data)
attach(DataFrame)

DFSplit <- split(DataFrame[ , "Data"], list(Do, Cl, In, Sa))

The function 'ids' is a helper function that identifies the lists names

ids <- function(Do, Cl, In, Sa){
    grep( paste( "^" , Do, "\\.",
                Cl, "\\.",
                In,
                "\\.", Sa,sep=""),
         names(DFSplit), value = TRUE)}

mapply(ids, Do, Cl, In, Sa, SIMPLIFY = FALSE)

The above mapply produces 6144 lists. If you look at the mapply output you will notice that there is 384 unique list names but each is repeated 16 times 384*16=6144.

As an ugly and highly costly solution I used unique; I need a better fundamental solution.

unique(mapply(ids, Do, Cl, In, Sa, SIMPLIFY = FALSE))

The dG function is the one that I want to operates on each of the 'DFSplit' lists. It has the same issue as the previous ids function. It uses the ids function as an input.

dG <- function(Do,Cl, In, Sa){
    dg <- 100*
                (1-10^-( DFSplit[[ids(Do,  Cl, In, Sa)]] - DFSplit[[ids(0, Cl, In, Sa)]])) /
                (1-10^-( DFSplit[[ids(100, Cl, In, Sa)]] - DFSplit[[ids(0, Cl, In, Sa)]])) - Do
    dg}

I tried to use dG as follows and it is not what I want.

dG(Do,Cl, In, Sa)

It only evaluated the LAST part of the dG function (- Do) plus this warning

In grep(paste("^", unique(Do), "\.", unique(Cl), "\.", unique(In), : argument 'pattern' has length > 1 and only the first element will be used

Then I tried mapply

mapply(dG, Do, Cl, In, Sa, SIMPLIFY = FALSE)

mapply correctly evaluated the function with my data. mapply produces 6144 lists. You will notice that the mapply output is basically 384 unique lists, each repeated 16 times 384*16=6144.

My thought would be:

  1. eliminate the repetition in my first function 'ids', which I do not know how to do .
  2. change the arguments of the second function so the arguments' lengths would be 384. Maybe use the names of the lists as an input argument. which I do not know how.

  3. Change the formula dG and not use (Do, Cl, In, Sa) arguments since each one has a length of 6144

Upvotes: 0

Views: 328

Answers (1)

Ricardo Saporta
Ricardo Saporta

Reputation: 55420

UPDATE:

The comment you made to @Roland, was all you had to put in each of your previous related questions, this once included.

The entirety of your process can be handled in one line of code:

library(data.table)
myDT <- data.table(DataFrame)

myDT[ , "TVI" :=  100 * (1 - 10^-(Data - Data[Do==0])) / (1 - 10^-(Data[Do==100] - Data[Do==0])) 
      , by=list(Cl, In, Sa)]

# this is your Tonval Value Increase
myDT$TVI


original answer:

It's stil awfully unclear what you are trying to accomplish. However, here are two concepts that should be able to save you a world of headaches.

First, you do not need your ids function. You can get more mileage out of expand.grid:

myIDs <- expand.grid(unique(Do), unique(Cl), unique(In), unique(Sa))

# You can then use something like 
apply(myIDs, 1, paste, sep=".")
# to get the same results.  Or whatever other function suits

However, even that is not neccessary.


Here is the equivalent of your dG function using data.table.

Notice there is no need for any of the splitting or ids or anything like that.
Everything is hanlded by the by function in data.table.

library(data.table)
myDT <- data.table(DataFrame)

myDT

dG_DT <- 
    100 * 
    1 - 10^(   myDT[ ,     Data, by=list(Do, Cl, In, Sa)][, Data] 
             - myDT[Do==0, Data, by=list(Do, Cl, In, Sa)][, Data]
            ) / 

    1 - 10^(   myDT[Do==100, Data, by=list(Do, Cl, In, Sa)][, Data]
             - myDT[Do==0,   Data, by=list(Do, Cl, In, Sa)][, Data]
            ) - 
    myDT[, Do]

dG_DT

Upvotes: 5

Related Questions