Jose Victor Zambrana
Jose Victor Zambrana

Reputation: 519

R function returning a data.frame using for loop

I would like to create a function that does a for loop to create multiple datasets. These datasets should be returned into a single dataset, which will be the output of my function.

I did the following code. It works when the for loop is outside the function, but it does not work when the loop is inside another function. The problem with my function, is that it only gives me back the first (i) dataset.

library(broom)
library(dplyr)

# My function
validation <- function(x, y) {
    df <- NULL
for (i in 1:ncol(x)) {
  coln <- colnames(x)[i]
  covariate <- as.vector(x[,i])
  models <- (tidy(glm(y ~ covariate, data = x, family = binomial)))
  df <- (rbind(df, cbind(models, coln))) %>% filter( term != "(Intercept)")
  return(df)
}
 }

# Test function
validation(mtcars, mtcars$am)

term         estimate  std.error  statistic        p.value coln
covariate   0.3070282   0.1148416   2.673493    0.007506579 mpg

This function should give me the following output:

  term     estimate    std.error     statistic     p.value coln
1  covariate  0.307028190 1.148416e-01  2.6734932353 0.007506579  mpg
2  covariate -0.691175096 2.536145e-01 -2.7252982408 0.006424343  cyl
3  covariate -0.014604292 5.167837e-03 -2.8259972293 0.004713367 disp
4  covariate -0.008117121 6.074337e-03 -1.3362973916 0.181452089   hp
5  covariate  5.577358500 2.062575e+00  2.7040753425 0.006849476 drat
6  covariate -4.023969940 1.436416e+00 -2.8013963535 0.005088198   wt
7  covariate -0.288189820 2.278968e-01 -1.2645629995 0.206028024 qsec
8  covariate  0.693147181 7.319250e-01  0.9470194188 0.343628884   vs
9  covariate 51.132135568 7.774641e+04  0.0006576784 0.999475249   am
10 covariate 21.006490452 3.876257e+03  0.0054192724 0.995676067 gear
11 covariate  0.073173343 2.254018e-01  0.3246350695 0.745457282 carb

Upvotes: 2

Views: 894

Answers (1)

akrun
akrun

Reputation: 887048

If we change the return(df) from the inner loop to outer, it should work because the 'df' return inside the inner loop is just the output just got updated i.e. the first run output

validation <- function(x, y) {
    df <- NULL
    for (i in 1:ncol(x)) {
      coln <- colnames(x)[i]
      covariate <- as.vector(x[,i])
      models <- (tidy(glm(y ~ covariate, data = x, family = binomial)))
      df <- (rbind(df, cbind(models, coln))) %>% filter( term != "(Intercept)")
      # to understand it better, create some print statement
      print(sprintf("column index : %d", i))
      print('-----------------')
      print('df in each loop')
      print(df)
      print(sprintf("%dth loop ends", i))

        }
      df
     }

-checking

validation(mtcars, mtcars$am)
#       term     estimate    std.error     statistic     p.value coln
#1  covariate  0.307028190 1.148416e-01  2.6734932353 0.007506579  mpg
#2  covariate -0.691175096 2.536145e-01 -2.7252982408 0.006424343  cyl
#3  covariate -0.014604292 5.167837e-03 -2.8259972293 0.004713367 disp
#4  covariate -0.008117121 6.074337e-03 -1.3362973916 0.181452089   hp
#5  covariate  5.577358500 2.062575e+00  2.7040753425 0.006849476 drat
#6  covariate -4.023969940 1.436416e+00 -2.8013963535 0.005088198   wt
#7  covariate -0.288189820 2.278968e-01 -1.2645629995 0.206028024 qsec
#8  covariate  0.693147181 7.319250e-01  0.9470194188 0.343628884   vs
#9  covariate 51.132135568 7.774641e+04  0.0006576784 0.999475249   am
#10 covariate 21.006490452 3.876257e+03  0.0054192724 0.995676067 gear
#11 covariate  0.073173343 2.254018e-01  0.3246350695 0.745457282 carb

Upvotes: 2

Related Questions