Reputation: 103
I wish to summarize a set of data in a dataframe using dplyer.
Concerning the "vars" argument, the documentation reads:
A list of columns generated by vars(), a character vector of column names, a numeric vector of column positions, or NULL.
I have the following behavior depending on the type of "vars" argument:
summarize_at(vars(D8,D9,D10), mean, na.rm=TRUE) # works
summarize_at(c("D8","D9","D10"), mean, na.rm=TRUE) # works
summarize_at(c(12,13,14), mean, na.rm=TRUE) # Using column indexes for D8, D9 and D10, respectively
# ! Can't convert a character `NA` to a symbol.
summarize_at(c(12:14), mean, na.rm=TRUE) # Same error as c(12,13,14)
Why I'm getting that error?
POST EDIT: Adding data and actual code
Data:
# A tibble: 12 x 5
TTMENT DOSE D8 D9 D10
<chr> <dbl> <dbl> <dbl> <dbl>
1 Group_1 0 40.3 41.1 41.5
2 Group_1 0 37.4 36.9 37.1
3 Group_1 0 44.8 44.1 44.4
4 Group_2 450 39.6 39.6 39.4
5 Group_2 450 40.6 41.2 40.8
6 Group_2 450 41.1 42.1 41.2
7 Group_3 500 38.5 39.2 39.9
8 Group_3 500 41.6 41.6 41.5
9 Group_3 500 41.8 41.8 42.4
10 Group_4 700 43.6 42 42.4
11 Group_4 700 43.1 42.7 42.7
12 Group_4 700 41.6 40.8 41.9
Error triggering code:
group_by(TTMENT, DOSE) %>%
#summarize_at(c("D8","D9","D10"), mean, na.rm=TRUE)
#summarize_at(vars(D8,D9,D10), mean, na.rm=TRUE)
summarize_at(c(3,4,5), mean, na.rm=TRUE)
Full error:
Error in FUN()
: ! Can't convert a character NA
to a symbol. Backtrace:
I actually want an output showing mean, SD and SE presented in 3 rows per group (rather than in columns); and if possible an asterisk next to the mean in case of significant t-test between each group and the reference group (Group 1). Something like that:
Group Statistic D8 D9 D10
Group_1 Mean XX XX XX
Group_1 SD XX XX XX
Group_1 SE XX XX XX
Group_2 Mean XX* XX XX*
Group_2 SD XX XX XX
Group_2 SE XX XX XX
Group_3 etc.
Any ideas on how to achieve this?
Upvotes: 1
Views: 2315
Reputation: 103
Just posting an answer as I found an explanation (newbie topic though...)
Apparently, by using group_by the columns used to group the data are extracted from the column indexes. Therefore, given the dataframe:
# A tibble: 12 x 5
TTMENT DOSE D8 D9 D10
<chr> <dbl> <dbl> <dbl> <dbl>
1 Group_1 0 40.3 41.1 41.5
2 Group_1 0 37.4 36.9 37.1
3 Group_1 0 44.8 44.1 44.4
4 Group_2 450 39.6 39.6 39.4
5 Group_2 450 40.6 41.2 40.8
6 Group_2 450 41.1 42.1 41.2
7 Group_3 500 38.5 39.2 39.9
8 Group_3 500 41.6 41.6 41.5
9 Group_3 500 41.8 41.8 42.4
10 Group_4 700 43.6 42 42.4
11 Group_4 700 43.1 42.7 42.7
12 Group_4 700 41.6 40.8 41.9
The following code fails as it assumes column indexes 3, 4 and 5 for columns D8, D9 and D10 respectively:
results <- stack %>%
group_by(TTMENT, DOSE) %>%
summarize_at(c(3:5), mean, na.rm=TRUE)
Results in error:
Error in `FUN()`:
! Can't convert a character `NA` to a symbol.
In contrast, the following code provides the expected result, as it assumes column indexes 1, 2 and 3 for columns D8, D9 and D10 respectively. This is ignores TTMENT and DOSE for the index counting:
results <- stack %>%
group_by(TTMENT, DOSE) %>%
summarize_at(c(1:3), mean, na.rm=TRUE)
Result:
# A tibble: 4 x 5
# Groups: TTMENT [4]
TTMENT DOSE D8 D9 D10
<chr> <dbl> <dbl> <dbl> <dbl>
1 Group_1 0 40.8 40.7 41
2 Group_2 450 40.4 41.0 40.5
3 Group_3 500 40.6 40.9 41.3
4 Group_4 700 42.8 41.8 42.3
Thanks to @jpiversen, as his/her comment helped to understand what was going on.
Upvotes: 3