Reputation: 123

How to extract the name of a column from a data frame to be used in the loop?

I would like to copy the text of a data frame's column names one-by-one in a for loop. My code seems to return NULL values from the column name argument.

More broadly, I want to create a summary by factor of each of several columns.

# Create an example data frame
df <- data.frame( c( "a", "b", "c", "b", "c"), c( 6, 4, 10, 9, 11), c( 1, 3, 5, 3, 6))

colnames(df) <- c( "Group", "Num.Hats", "Num.Balls")

example data frame with each group member's number of hats and number of balls

Now I want to loop over columns two and three, creating a data object storing the summary statistics by Group. The point is to get a look at how groups A, B, and C differ from one another with respect to balls and with respect to hats.

My code looks like this:

# Evaluate stats of each group
for (i in 2:3){
    assign(paste0("Eval.", colnames(df[[i]])), tapply(df[,i], df$Group, summary))
}

I am getting a single object called "Eval." With the summary statistics for Num.Balls. To be clear, I would like two objects, one called Eval.Num.Hats and one called Eval.Num.Balls.

If colnames() cannot be used in this way, is there another function to achieve my desired result? Alternatively, I'd be open to another solution if the loop is not required.

Upvotes: 2

Answers (3)

Jonathan V. Solórzano

Reputation: 5620

Here is another solution without any loops, using tidyr and broom.

library(tidyr)
library(broom)

df %>%
  #Change from wide to long format
  pivot_longer(cols = c("Num.Hats","Num.Balls"),
               names_to = "Var") %>%
  #group by Group (a,b,c) and Var (Num.Hats, Num.Balls)
  group_by(Group, Var) %>%
  #Calculate the summary function for each group
  do(tidy(summary(.$value)))

# A tibble: 6 x 8
# Groups:   Group, Var [6]
#  Group Var    minimum    q1 median  mean    q3 maximum
#  <fct> <chr>    <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
#1 a     Num.B~       1  1       1     1    1          1
#2 a     Num.H~       6  6       6     6    6          6
#3 b     Num.B~       3  3       3     3    3          3
#4 b     Num.H~       4  5.25    6.5   6.5  7.75       9
#5 c     Num.B~       5  5.25    5.5   5.5  5.75       6
#6 c     Num.H~      10 10.2    10.5  10.5 10.8       11

Upvotes: 1

M--

Reputation: 28945

You can avoid a for-loop altogether.

Explanation:

Here, using lapply I am looping over all columns (using their names) to be summarized, except the first one which is used for grouping (see what names(df1)[-1] returns).

with function basically attaches the dataframe so you don't need to do dataframe$column and you can simply type the column name.

by(variable to function, grouping variable, function) is used to apply summary by group.

We need to use the column name as variable and not character. That's why I am using mget() to convert the character name of the column to the variable.

smry.ls.df1 <- lapply(names(df1)[-1], function(col) with(df1, by(mget(col), Group, summary)))
names(smry.ls.df1) <- paste0("Eval.", names(df1)[-1]) #setting the names as you've shown

smry.list.df1

#> $Eval.Num.Hats
#> Group: a
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       6       6       6       6       6       6 
#> -------------------------------------------------------- 
#> Group: b
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    4.00    5.25    6.50    6.50    7.75    9.00 
#> -------------------------------------------------------- 
#> Group: c
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   10.00   10.25   10.50   10.50   10.75   11.00 
#> 
#> $Eval.Num.Balls
#> Group: a
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       1       1       1       1       1       1 
#> -------------------------------------------------------- 
#> Group: b
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       3       3       3       3       3       3 
#> -------------------------------------------------------- 
#> Group: c
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    5.00    5.25    5.50    5.50    5.75    6.00

If you want them to be saved as separate objects (not recommended) you can use list2env:

list2env(smry.list.df1, globalenv())

Data:

df1 <- data.frame(Group = c( "a", "b", "c", "b", "c"), 
                  Num.Hats = c( 6, 4, 10, 9, 11), 
                  Num.Balls = c( 1, 3, 5, 3, 6))

Upvotes: 2

akrun

Reputation: 887153

The df[[i]] is extracting the column as a vector and there are no colnames. We can either use df[i] or the correct option is colnames(df)[i]

for (i in 2:3){
    assign(paste0("Eval.", colnames(df)[i]), tapply(df[,i], df$Group, summary))
 }

-output

Eval.Num.Hats
#$a
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#      6       6       6       6       6       6 

#$b
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   4.00    5.25    6.50    6.50    7.75    9.00 

#$c
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#  10.00   10.25   10.50   10.50   10.75   11.00 

Eval.Num.Balls
#$a
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#      1       1       1       1       1       1 

#$b
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#      3       3       3       3       3       3 

#$c
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   5.00    5.25    5.50    5.50    5.75    6.00

Upvotes: 2

How to extract the name of a column from a data frame to be used in the loop?

Answers (3)

Related Questions