Reputation: 123
I would like to copy the text of a data frame's column names one-by-one in a for loop. My code seems to return NULL values from the column name argument.
More broadly, I want to create a summary by factor of each of several columns.
# Create an example data frame
df <- data.frame( c( "a", "b", "c", "b", "c"), c( 6, 4, 10, 9, 11), c( 1, 3, 5, 3, 6))
colnames(df) <- c( "Group", "Num.Hats", "Num.Balls")
Now I want to loop over columns two and three, creating a data object storing the summary statistics by Group. The point is to get a look at how groups A, B, and C differ from one another with respect to balls and with respect to hats.
My code looks like this:
# Evaluate stats of each group
for (i in 2:3){
assign(paste0("Eval.", colnames(df[[i]])), tapply(df[,i], df$Group, summary))
}
I am getting a single object called "Eval."
With the summary statistics for Num.Balls
. To be clear, I would like two objects, one called Eval.Num.Hats
and one called Eval.Num.Balls
.
If colnames()
cannot be used in this way, is there another function to achieve my desired result? Alternatively, I'd be open to another solution if the loop is not required.
Upvotes: 2
Views: 1493
Reputation: 5620
Here is another solution without any loops, using tidyr
and broom
.
library(tidyr)
library(broom)
df %>%
#Change from wide to long format
pivot_longer(cols = c("Num.Hats","Num.Balls"),
names_to = "Var") %>%
#group by Group (a,b,c) and Var (Num.Hats, Num.Balls)
group_by(Group, Var) %>%
#Calculate the summary function for each group
do(tidy(summary(.$value)))
# A tibble: 6 x 8
# Groups: Group, Var [6]
# Group Var minimum q1 median mean q3 maximum
# <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 a Num.B~ 1 1 1 1 1 1
#2 a Num.H~ 6 6 6 6 6 6
#3 b Num.B~ 3 3 3 3 3 3
#4 b Num.H~ 4 5.25 6.5 6.5 7.75 9
#5 c Num.B~ 5 5.25 5.5 5.5 5.75 6
#6 c Num.H~ 10 10.2 10.5 10.5 10.8 11
Upvotes: 1
Reputation: 28945
You can avoid a for-loop altogether.
Explanation:
Here, using lapply
I am looping over all columns (using their names) to be summarized, except the first one which is used for grouping (see what names(df1)[-1]
returns).
with
function basically attaches the dataframe so you don't need to do dataframe$column
and you can simply type the column name.
by(variable to function, grouping variable, function)
is used to apply summary
by group.
We need to use the column name as variable and not character. That's why I am using mget()
to convert the character name of the column to the variable.
smry.ls.df1 <- lapply(names(df1)[-1], function(col) with(df1, by(mget(col), Group, summary)))
names(smry.ls.df1) <- paste0("Eval.", names(df1)[-1]) #setting the names as you've shown
smry.list.df1
#> $Eval.Num.Hats
#> Group: a
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 6 6 6 6 6 6
#> --------------------------------------------------------
#> Group: b
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 4.00 5.25 6.50 6.50 7.75 9.00
#> --------------------------------------------------------
#> Group: c
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 10.00 10.25 10.50 10.50 10.75 11.00
#>
#> $Eval.Num.Balls
#> Group: a
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1 1 1 1 1 1
#> --------------------------------------------------------
#> Group: b
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 3 3 3 3 3 3
#> --------------------------------------------------------
#> Group: c
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 5.00 5.25 5.50 5.50 5.75 6.00
If you want them to be saved as separate objects (not recommended) you can use list2env
:
list2env(smry.list.df1, globalenv())
Data:
df1 <- data.frame(Group = c( "a", "b", "c", "b", "c"),
Num.Hats = c( 6, 4, 10, 9, 11),
Num.Balls = c( 1, 3, 5, 3, 6))
Upvotes: 2
Reputation: 887153
The df[[i]]
is extracting the column as a vector
and there are no colnames
. We can either use df[i]
or the correct option is colnames(df)[i]
for (i in 2:3){
assign(paste0("Eval.", colnames(df)[i]), tapply(df[,i], df$Group, summary))
}
-output
Eval.Num.Hats
#$a
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 6 6 6 6 6 6
#$b
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4.00 5.25 6.50 6.50 7.75 9.00
#$c
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 10.00 10.25 10.50 10.50 10.75 11.00
Eval.Num.Balls
#$a
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 1 1 1 1 1 1
#$b
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 3 3 3 3 3 3
#$c
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 5.00 5.25 5.50 5.50 5.75 6.00
Upvotes: 2