smishra
smishra

Reputation: 13

Printing a column name inside lapply function

I have searched through the archives but have not found a suitable answer. I am a beginner and please excuse my ignorance if I am posing a very elementary query. I am trying to get the apply function to print the column names while processing through a data frame. I understand that lapply converts the column of data frame to vector, but is their way to print the column name while printing output. Like in the following example

   > mydata<-data.frame(matrix(rep(c(1:2),times= 50),20,5))
   > colnames(mydata)<-letters[1:5]
   > lapply(mydata[,2:4],function(x){CrossTable(x,mydata[,5])})

I want the output to show the column name it is processing while printing the output table. It only prints only "x" right now in the contingency tables.

Upvotes: 1

Views: 2379

Answers (2)

Elena
Elena

Reputation: 155

Ok, this is old, but I came across the same problem and wanted to share my approach, although it violates to some extent the *apply idea. The upside is: you can integrate anything in the loop. So I needed to run an ANOVA on 2 output variables, depending on columns I looped through with lapply, get the p_values to annotate the plot and create multiple plots side-by-side. The core is that it combines a for-loop with lapply

for (i in 0:10){
i<-i+1
lapply(df[i],function(x) {
  myfactor<-names(df)[i] #gets the column name
  anova_model_a<-lm(a~x,df) #needed to run ANOVA per column
  anova_model_b<-lm(b~x,df) #needed to run ANOVA per column
  tab_aov_a<-tidy(summary(anova_model_a)) #proper result table
  tab_aov_b<-tidy(summary(anova_model_b)) #proper result table
  labels_a <- data.frame(drv = "1", label=c(round(tab_aov_a$p.value[2],4))) #needed for labelling the graph. I only had 2 groups for comparison
  labels_b <- data.frame(drv = "1", label=c(round(tab_aov_b$p.value[2],4))) #needed for labelling the graph
  fig1<-ggplot(df,aes(x,a))+
    geom_boxplot()+
    ggtitle("a")+
    geom_text(data=labels_a,aes(x=drv,y=12,label=label),colour="blue",angle=0,hjust=0.5, vjust=0.5,size=5)+
    xlab(myfactor)

  fig2<-ggplot(df,aes(x,b))+
    geom_boxplot()+
    ggtitle("b")+
    geom_text(data=labels_b,aes(x=drv,y=6,label=label),colour="blue",angle=0,hjust=0.5, vjust=0.5,size=5)+
    xlab(myfactor)
  arrangement<-grid.arrange(fig1,fig2,nrow=2)
  print(arrangement)
})
}

Upvotes: 0

Sandy Muspratt
Sandy Muspratt

Reputation: 32789

Assuming that the CrossTable() function is contained in the descr package, it seems that the argument to dnn gives the row and column names in the crosstabulation. The trick is to get lapply to read both the names and the data. names(mydata)[2:4] gives the names; mydata[, 2:4] is the data. The syntax for lapply is:

lapply(x, FUN, ...)

FUN is applied to each element of x, and ... allows optional arguments to be passed to FUN. Thus, both names(mydata)[2:4] and mydata[, 2:4] can be passed FUN.

mydata<-data.frame(matrix(rep(c(1:2),times= 50),20,5))
colnames(mydata)<-letters[1:5]

library(descr)

lapply(names(mydata)[2:4], 
   function(dfNames, dfData) {
      return(CrossTable(dfData[[dfNames]], mydata[,5], dnn = c(dfNames, "mydata[,5]")))
}, mydata[, 2:4] )

The function operates on each element in names(mydata)[2:4], and the data file is passed as an additional parameter. This way, the relevant column (dfData[[dfNames]]) and the name of the relevant column (dfName) are available to CrossTable.

[[1]]
   Cell Contents 
|-------------------------|
|                       N | 
| Chi-square contribution | 
|           N / Row Total | 
|           N / Col Total | 
|         N / Table Total | 
|-------------------------|

===============================
          mydata[,5]
b             1       2   Total
-------------------------------
1            10       0      10
          5.000   5.000        
          1.000   0.000   0.500
          1.000   0.000        
          0.500   0.000        
-------------------------------
2             0      10      10
          5.000   5.000        
          0.000   1.000   0.500
          0.000   1.000        
          0.000   0.500        
-------------------------------
Total        10      10      20
          0.500   0.500
===============================

[[2]]
   Cell Contents 
|-------------------------|
|                       N | 
| Chi-square contribution | 
|           N / Row Total | 
|           N / Col Total | 
|         N / Table Total | 
|-------------------------|

===============================
          mydata[,5]
c             1       2   Total
-------------------------------
1            10       0      10
          5.000   5.000        
          1.000   0.000   0.500
          1.000   0.000        
          0.500   0.000        
-------------------------------
2             0      10      10
          5.000   5.000        
          0.000   1.000   0.500
          0.000   1.000        
          0.000   0.500        
-------------------------------
Total        10      10      20
          0.500   0.500
===============================

[[3]]
   Cell Contents 
|-------------------------|
|                       N | 
| Chi-square contribution | 
|           N / Row Total | 
|           N / Col Total | 
|         N / Table Total | 
|-------------------------|

===============================
          mydata[,5]
d             1       2   Total
-------------------------------
1            10       0      10
          5.000   5.000        
          1.000   0.000   0.500
          1.000   0.000        
          0.500   0.000        
-------------------------------
2             0      10      10
          5.000   5.000        
          0.000   1.000   0.500
          0.000   1.000        
          0.000   0.500        
-------------------------------
Total        10      10      20
          0.500   0.500
===============================

Upvotes: 0

Related Questions