Lime
Lime

Reputation: 754

Subsetting a function variable within another variable

I have the following function that performs a step-wise linear regression, and it works well with numerical and integer values, although, when I have factors as independent variables, I get the following error:

Error in [.data.frame(d, , names(resul0)) : undefined columns selected

The layout of the function:

stepfor(bird$Richness, data.frame(GARDENSIZE, Site, season), alfa = 0.2)

I have figured out a way that splits the factors into columns and assigns them respective values following the comments, given by this:

   x <- function(x) {x %>% 
            select(where(negate(is.numeric))) %>% 
            map_dfc(~ model.matrix(~ .x -1) %>% 
                        as_tibble) %>% 
            rename_all(~ str_remove(., "\\.x")) 
}

Though, I'm not sure how I can include it into the function below, so that x can be implemented with the function below by calling it stepfor likeso:

stepfor(bird$Richness, data.frame(x(bird)), alfa = 0.2)

I just want to know how to include the function x within the function below to have it work like above. And if there aren't any factors in the data, then set the function as FALSE so it doesn't return an error like x is missing.

Here is my function:

stepfor<-function (y = y, d = d, alfa = 0.05)
{
    pval <- NULL
    design <- NULL
    j = 1
    resul0 <- summary(lm(y ~ ., data = d))$coefficients[, 4]
    d <- as.data.frame(d[, names(resul0)][-1])
    for (i in 1:ncol(d)) {
        sub <- cbind(design, d[, i])
        sub <- as.data.frame(sub)
        lm2 <- lm(y ~ ., data = sub)
        result <- summary(lm2)
        pval[i] <- result$coefficients[, 4][j + 1]
    }
    min <- min(pval)
    while (min < alfa) {
        b <- pval == min
        c <- c(1:length(pval))
        pos <- c[b]
        pos <- pos[!is.na(pos)][1]
        design <- cbind(design, d[, pos])
        design <- as.data.frame(design)
        colnames(design)[j] <- colnames(d)[pos]
        j = j + 1
        d <- as.data.frame(d[, -pos])
        pval <- NULL
        if (ncol(d) != 0) {
            for (i in 1:ncol(d)) {
                sub <- cbind(design, d[, i])
                sub <- as.data.frame(sub)
                lm2 <- lm(y ~ ., data = sub)
                result <- summary(lm2)
                pval[i] <- result$coefficients[, 4][j + 1]
            }
            min <- min(pval, na.rm = TRUE)
        }
        else min <- 1
    }
    if (is.null(design)) {
        lm1 <- lm(y ~ 1)
    }
    else {
        lm1 <- lm(y ~ ., data = design)
    }
    return(lm1)
}

Reproducible code:

bird<- structure(list(season = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L), .Label = c("Summer", 
"Winter"), class = "factor"), Richness = c(20L, 17L, 18L, 19L, 
11L, 15L, 17L, 15L, 15L, 9L, 13L, 14L, 12L, 18L, 30L, 30L, 17L, 
25L, 32L, 32L, 29L, 29L, 27L, 18L, 25L, 24L, 15L, 18L, 23L, 22L, 
25L, 22L, 22L, 23L, 17L, 22L, 7L, 15L, 16L, 20L, 24L, 21L, 22L, 
39L, 17L, 17L, 13L, 26L, 25L, 20L), GARDEN_SIZE = structure(c(1L, 
1L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 3L, 1L, 1L, 1L, 
1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 3L, 3L, 2L, 2L, 
1L), .Label = c("L", "M", "S"), class = "factor"), Site = structure(c(1L, 
1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L), .Label = c("R", "S", "U"), class = "factor")), row.names = c(NA, 
50L), class = "data.frame")

Upvotes: 0

Views: 100

Answers (1)

Sirius
Sirius

Reputation: 5429

Consider this:


stepfor<-function (y = y, d = d, alfa = 0.05)
{

    # split the incoming data to give non-numeric the factor treatment:
    x1 <- d %>% select(where(negate(is.numeric))) %>%
        map_dfc(~ model.matrix(~ .x -1) %>%
                    as_tibble) %>%
        rename_all(~ str_remove(., "\\.x"))
    x2 <- d %>% select(where(is.numeric))

    d <- cbind( x1, x2 )

    pval <- NULL
    design <- NULL
    j = 1
    resul0 <- summary(lm(y ~ ., data = d))$coefficients[, 4][-1]
    d <- as.data.frame(d[, names(resul0)])
 
    # rest of function body as is

}


eg. move the [-1] from the 5th line to the 4th to remove the intercept term earlier. The reamining coefficients shouldnow match the ones you have in your data.frame and names(resul0) should all exist in your data.frame

There is a problem with your approach to tackle this. You do:

d <- as.data.frame( d[, names(resul0) ] [-1] )

This code tries to look up all of the names(resul0) inside the d data.frame. This includes the intercept term, and this therefore fails. (And at this point its too late to remove the intercept afterwards as the damage has already been done)

You need to remove the intercept before looking up the names inside d. Then the name-error won't happen.

The body of the x function can be inserted in there, quite straight forward.

Upvotes: 1

Related Questions