Pisuke
Pisuke

Reputation: 103

R dplyr summarize_at: numeric vector of column positions results in "Can't convert a character NA to a symbol" - Summary stats output with t-test

I wish to summarize a set of data in a dataframe using dplyer.

Concerning the "vars" argument, the documentation reads:

A list of columns generated by vars(), a character vector of column names, a numeric vector of column positions, or NULL.

I have the following behavior depending on the type of "vars" argument:

summarize_at(vars(D8,D9,D10), mean, na.rm=TRUE)     # works
summarize_at(c("D8","D9","D10"), mean, na.rm=TRUE)  # works
summarize_at(c(12,13,14), mean, na.rm=TRUE)         # Using column indexes for D8, D9 and D10, respectively
                                                    # ! Can't convert a character `NA` to a symbol.
summarize_at(c(12:14), mean, na.rm=TRUE)            # Same error as c(12,13,14)

Why I'm getting that error?

POST EDIT: Adding data and actual code

Data:

    # A tibble: 12 x 5
   TTMENT   DOSE    D8    D9   D10
   <chr>   <dbl> <dbl> <dbl> <dbl>
 1 Group_1     0  40.3  41.1  41.5
 2 Group_1     0  37.4  36.9  37.1
 3 Group_1     0  44.8  44.1  44.4
 4 Group_2   450  39.6  39.6  39.4
 5 Group_2   450  40.6  41.2  40.8
 6 Group_2   450  41.1  42.1  41.2
 7 Group_3   500  38.5  39.2  39.9
 8 Group_3   500  41.6  41.6  41.5
 9 Group_3   500  41.8  41.8  42.4
10 Group_4   700  43.6  42    42.4
11 Group_4   700  43.1  42.7  42.7
12 Group_4   700  41.6  40.8  41.9

Error triggering code:

  group_by(TTMENT, DOSE) %>%
     #summarize_at(c("D8","D9","D10"), mean, na.rm=TRUE)
     #summarize_at(vars(D8,D9,D10), mean, na.rm=TRUE)
     summarize_at(c(3,4,5), mean, na.rm=TRUE)

Full error:

Error in FUN(): ! Can't convert a character NA to a symbol. Backtrace:

  1. stack %>% group_by(TTMENT, DOSE) %>% ...
  2. dplyr::summarize_at(., c(3, 4, 5), mean, na.rm = TRUE)
  3. dplyr:::manip_at(...)
  4. dplyr:::tbl_at_syms(.tbl, .vars, .include_group_vars = .include_group_vars)
  5. rlang::syms(vars)
  6. rlang:::map(x, sym)
  7. base::lapply(.x, .f, ...)
  8. rlang FUN(X[[i]], ...) Error in FUN(X[[i]], ...) :

I actually want an output showing mean, SD and SE presented in 3 rows per group (rather than in columns); and if possible an asterisk next to the mean in case of significant t-test between each group and the reference group (Group 1). Something like that:

Group    Statistic     D8    D9    D10
Group_1   Mean         XX    XX     XX
Group_1   SD           XX    XX     XX
Group_1   SE           XX    XX     XX
Group_2   Mean         XX*   XX     XX*
Group_2   SD           XX    XX     XX
Group_2   SE           XX    XX     XX
Group_3  etc.

Any ideas on how to achieve this?

Upvotes: 1

Views: 2315

Answers (1)

Pisuke
Pisuke

Reputation: 103

Just posting an answer as I found an explanation (newbie topic though...)

Apparently, by using group_by the columns used to group the data are extracted from the column indexes. Therefore, given the dataframe:

     # A tibble: 12 x 5
   TTMENT   DOSE    D8    D9   D10
   <chr>   <dbl> <dbl> <dbl> <dbl>
 1 Group_1     0  40.3  41.1  41.5
 2 Group_1     0  37.4  36.9  37.1
 3 Group_1     0  44.8  44.1  44.4
 4 Group_2   450  39.6  39.6  39.4
 5 Group_2   450  40.6  41.2  40.8
 6 Group_2   450  41.1  42.1  41.2
 7 Group_3   500  38.5  39.2  39.9
 8 Group_3   500  41.6  41.6  41.5
 9 Group_3   500  41.8  41.8  42.4
10 Group_4   700  43.6  42    42.4
11 Group_4   700  43.1  42.7  42.7
12 Group_4   700  41.6  40.8  41.9

The following code fails as it assumes column indexes 3, 4 and 5 for columns D8, D9 and D10 respectively:

results <- stack %>%
  group_by(TTMENT, DOSE) %>%
  summarize_at(c(3:5), mean, na.rm=TRUE)

Results in error:

Error in `FUN()`:
! Can't convert a character `NA` to a symbol.

In contrast, the following code provides the expected result, as it assumes column indexes 1, 2 and 3 for columns D8, D9 and D10 respectively. This is ignores TTMENT and DOSE for the index counting:

results <- stack %>%
  group_by(TTMENT, DOSE) %>%
  summarize_at(c(1:3), mean, na.rm=TRUE)

Result:

# A tibble: 4 x 5
# Groups:   TTMENT [4]
  TTMENT   DOSE    D8    D9   D10
  <chr>   <dbl> <dbl> <dbl> <dbl>
1 Group_1     0  40.8  40.7  41  
2 Group_2   450  40.4  41.0  40.5
3 Group_3   500  40.6  40.9  41.3
4 Group_4   700  42.8  41.8  42.3

Thanks to @jpiversen, as his/her comment helped to understand what was going on.

Upvotes: 3

Related Questions