user1638567
user1638567

Reputation: 79

lapply in R - function to each column

I have a question that I believe requires lapply in R (though I am open to other solutions).

I have a dataset (code to generate pasted below) with multiple permutations of a binary variable, which results in a Y for each permuation. I am trying to run a model that uses the X1-X75 variables as predictors of each Y variable. This will ultimately be a imputation model, so the first step I need is to simply subset the data so that I get a separate dataset per permutation - e.g., Y.control.perm1, X1...X75, Y.control.perm2, X1....X75.

The trouble I am having is how to do this in an apply statement. I can't seem to get both the column of interest AND the predictors in the same dataset. Here is the code I have, where cont grabs only the control columns and ob are the rows of interest. In this case, I want 100 datasets (or a stacked dataset), with Y.control.perm1...100 unique to each dataset and X1-X75 appearing in all.

nperm=100 #number of permuations 
start=p+2+nperm
cont=seq(start+1,start+nperm*2+2,by=2) #grabbing columns of interest

test=lapply(d[which(d$ob==0),c(cont,1:p)], function(x){
              names(x)
              txt.imp=as.data.frame(x[c(cont,1:p)])
     })

The question boils down to: How can I use lapply (or a similar function) to apply a function to a subset of columns in a dataset, with each element of the list being a different column of the dataset?

This is the data generation code:

p=75
N=10
seed=342

# FUNCTION TO GENERATE ONE SIM #
dataGen = function(N, p, seed){
      set.seed(2398)
      X=rbinom(N*p,1,.5)
      df=data.frame(matrix(X,nrow=N,ncol=p))
      df$obs.txt=rep(0:1,N/2)
      x.for.perm=df$obs.txt
      perm=NULL

      for(i in 1:100){
            perm.i=permute(x.for.perm)
            perm=as.matrix(cbind(perm,perm.i))
      }

      df$TE=-1.3*df$X1-1.2*df$X2-.6*df$X3+.3*df$X4+.5*df$X5+1.1*df$X6+1.2*df$X7
      df=as.data.frame(cbind(df,perm))

      names(df)
      seed=set.seed(seed)
      length(df)
      col.vec=c(76,78:177)
      col.vec
      df.out<-lapply(df[,col.vec],function(x){
           y.obs.control=rnorm(N,0,1)   #observed y value under control
           df$y.obs.tx=ifelse(x==1,(y.obs.control+df$TE),NA)  #observed y value under TX
           #df$Y=ifelse(df$obs.txt==0,df$y.obs.control,df$y.obs.tx)  #observed Y value
           df$y.obs.control=ifelse(x==0,y.obs.control,NA)  #observed y value under control
           cbind(df$y.obs.control,df$y.obs.tx)
      })

      df2=do.call(cbind,df.out)

      names=c("y.obs.control","y.obs.tx")

      for(i in 1:100){
            names.i=c(paste("y.obs.control.p.",i,sep=""),paste("y.obs.tx.p.",i,sep=""))
            names=c(names,names.i)
      }

      colnames(df2)<-(names)
      df2=as.data.frame(df2)

      df2$ob=rep(0:1,each=N/2)
      df2$sim=rep(length(seed),each=N)
      df2=as.data.frame(cbind(df,df2))

      return(df2)
}

d=dataGen(10,75,43)

Upvotes: 0

Views: 394

Answers (2)

lmo
lmo

Reputation: 38500

Here is the lapply version of @hack-r's answer which will return a list containing the same data.frames as constructed in that answer.

# return a list of data.frames
myList <- lapply(cont, function(i), d[d$ob==0, c(i, 1:75)])
# add names to the list
names(myList) <- paste0("dataset", cont)

You may be interested in taking a look at @gregor's answer to this question for some nice tips on working with data.frames stored in lists.

Upvotes: 0

Hack-R
Hack-R

Reputation: 23214

This will create 100 datasets, named dataset1...dataset100, each with one of the Y variables and the 75 X variables of interest:

for(i in cont){
  nam <- paste("dataset", i, sep = "")
  assign(nam, d[d$ob==0,c(i,1:75)])
}

Upvotes: 0

Related Questions