Reputation: 35

How to get in a specific order the results of an r lapply function with arguments from a dataframe

Following a previous question I asked, I got an awesome answer.

Here is a quick summary: I want to compute a multidimensional development index based on South Africa Data for several years. My list is composed of individual information for each year, so basically df1 is about year 1 and df2 about year2.

df1<-data.frame(var1=c(1, 1,1), var2=c(0,0,1), var3=c(1,1,0))
df2<-data.frame(var1=c(1, 0,1), var2=c(1,0,1), var3=c(0,1,0))
mylist <-list (df1,df2)

var1 could be the stance on religion of each person, var2 how she voted in last national election, etc. In my very simple case, I have the data for 3 different persons each year. From there, I compute an index based on a number of variables (not all of them) You can find here a very simplified working index function, with only 2 of 3 variables, named dimX and dimY:

myindex <- function(x, dimX, dimY){
    econ_i<- ( x[dimX]+  x[dimY] ) 
    return ( (1/length(econ_i))*sum(econ_i) )
    }
myindex(df1, "var2", "var3")

and

myindex2 = function(x, d) {
    myindex(x, d[1], d[2])
}

Then I have my dataframe of variables I want to use for my index. I am trying to compute the index for several sets of variables.

args <- data.frame(set1=c("var1", "var2"), set2=c("var2", "var3"), stringsAsFactors = F)

I'd like to have the result as follows : (a)list(set1 = list(df1, df2), set2 = (df1, df2))instead of (b) list(df1 = list(set1, set2), df2 = list(set1, set2)). Case (a) represents a time series, meaning I have a list of results of my indexes each year for only one set of variables. Case (b) is the opposite where I have the index results of one year for every set of variables. Each individual result should be a unique numeric value. Hence, I am expecting to get a list of 2 sublists df1 and df2, each sublist containing 3 numeric values.

I've been adviced to do use that great command:

lapply(mylist, function(m) lapply(args, myindex2, x = m))

It's working great, but I get the result in the "wrong" format, namely the second one (b) I showed. How could I get the results ordered per set (i.e. case (a) as time series) instead of per year?

Thanks a lot for your help!

EDIT: I've managed to find a solution that doesn't answer the question, but still allows me to get my data in desired order. Namely, I'm transforming my list of lists to a matrix that I simply transpose.

Upvotes: 0

Answers (2)

Pierre-Jean Cottalorda

Reputation: 35

If that may provide any help, from this article, here my actual index function:

RCI_a_3det <-function(x, econ1, econ2, econ3, perso1, perso2, perso3, civic1, civic2, civic3){ 

    econ_i<- (1/3) *( x[econ1]+  x[econ2] + x[econ3]) 
    perso_i<- (1/3)*( x[perso1] + x[perso2] + x[perso3]) 
    civic_i<- (1/3)*(x[civic1] + x[civic2] + x[civic3]) 

    daf <- data.frame(econ_i, perso_i, civic_i) 
    colnames(daf)<- c("econ_i", "perso_i", "civic_i") 
    df1 <- subset(daf, daf$econ_i !=1 & daf$perso_i !=1 & daf$civic_i!=1 )

    sum_xik <- (df1$econ_i + df1$perso_i + df1$civic_i)

    return ( 1/(3*nrow(df1)) * sum(sum_xik, na.rm=T))

    }

Edit: x is a list of all personal information, for every variable and for every year. It is pretty large. I am using 9 variables to compute this index, but I actually have 30 such variables in my data, so I have set up a dataframe of sets of variables I could use to compute this index. This is the equivalent of my args df in the simple example. I am actually using 200 such combinations.

Upvotes: 1

InfiniteFlash

Reputation: 1058

This answer will be edited!

Currently, your function index() does this

myindex <- function(x, dimX, dimY){
  econ_i<- ( x[dimX]+  x[dimY] ) 
  return ( (1/length(econ_i))*sum(econ_i) )
}

Aren't you after this, however?

myindex <- function(x, dimX, dimY){
  econ_i<- ( x[,dimX]+  x[,dimY] ) 
  return ( (1/length(econ_i))*sum(econ_i) )
}

The way you have it right now, length(econ_i) always returns 1 because econ_i is a data.frame() and not a vector. The length of a data.frame() is always 1, while the length of a vector is the number of elements within it.

Kindly note that here is what the output looks like in R.

df1["var1"]
  var1
1    1
2    1
3    1

returns a data.frame()

df1[,"var1"]
[1] 1 1 1

returns a vector.

I will adjust this post to answer your question when you respond. I think it's important to solve this part first.

Upvotes: 1

How to get in a specific order the results of an r lapply function with arguments from a dataframe

Answers (2)

Related Questions