Reputation: 453
I am trying to calculate the mean, median, min, max across all variables across the grouping Site
using the summarize
function. In my code, I replace NA
with 0
, but I am also open to utilizing na.rm=TRUE
instead if it easy to incorporate.
I keep getting the following error message and cannot figure it out...
Error: Problem with `summarise()` input `..2`. i `..2 = list(mean, median, min, max)`. x `..2` must be size 6 or 1, not 4. i An earlier column had size 6. i The error occurred in group 1: Site = 1.
Below is my data and code:
Dataset Reprex
data = structure(list(Site = c(7, 1, 7, 7, 1, 1, 7, 1, 6, 1, 1), OS_days = c(264,
208, 184, 145, 131, 116, 82, 74, 76, 82, 68), ster_days = c(241,
135, 184, NA, 85, 106, NA, NA, NA, NA, 69), pct_ster = c(0.912878787878788,
0.649038461538462, 1, NA, 0.648854961832061, 0.913793103448276,
NA, NA, NA, NA, 1.01470588235294), first_ster_days = c(28, 72,
1, NA, 42, 1, NA, NA, NA, NA, 1), tot_bev_days = c(1, 13, NA,
NA, NA, 75, NA, NA, NA, NA, NA), pct_bev = c(0.00378787878787879,
0.0625, NA, NA, NA, 0.646551724137931, NA, NA, NA, NA, NA), first_bev_days = c(48,
86, NA, NA, NA, 22, NA, NA, NA, NA, NA), SPD = structure(c(1219.86,
1107, 1508, 442.74, 524.61, 1733.76, 2079.77, 443.44, NA, 601.8,
1621.3), label = "Measurement Number 1 mm")), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"))
knitr::kable(data, digits = 3)
| Site| OS_days| ster_days| pct_ster| first_ster_days| tot_bev_days| pct_bev| first_bev_days| SPD|
|----:|-------:|---------:|--------:|---------------:|------------:|-------:|--------------:|-------:|
| 7| 264| 241| 0.913| 28| 1| 0.004| 48| 1219.86|
| 1| 208| 135| 0.649| 72| 13| 0.062| 86| 1107.00|
| 7| 184| 184| 1.000| 1| NA| NA| NA| 1508.00|
| 7| 145| NA| NA| NA| NA| NA| NA| 442.74|
| 1| 131| 85| 0.649| 42| NA| NA| NA| 524.61|
| 1| 116| 106| 0.914| 1| 75| 0.647| 22| 1733.76|
| 7| 82| NA| NA| NA| NA| NA| NA| 2079.77|
| 1| 74| NA| NA| NA| NA| NA| NA| 443.44|
| 6| 76| NA| NA| NA| NA| NA| NA| NA|
| 1| 82| NA| NA| NA| NA| NA| NA| 601.80|
| 1| 68| 69| 1.015| 1| NA| NA| NA| 1621.30|
Code
data %>%
replace(is.na(.), 0) %>%
group_by(Site) %>%
dplyr::summarise(across(c(OS_days, ster_days, pct_ster, first_ster_days, tot_bev_days, pct_bev, first_bev_days, SPD)), list(mean, median, min, max))
Upvotes: 2
Views: 1828
Reputation: 78927
With many thanks to akrun guiding me. Here is a base R solution.
# function with all functions to apply
multi.fun <- function(x) {
c(mean = mean(x), median = median(x), min = min(x), max = max(x))
}
# replace NA with 0
data[is.na(data)] <- 0
# group by Site and apply function multi.fun
my_list <- lapply(split(data, data$Site), function(x) sapply(x, multi.fun))
# convert to df
do.call(rbind, my_list)
Output:
Site OS_days ster_days pct_ster first_ster_days tot_bev_days pct_bev first_bev_days SPD
mean 1 113.1667 65.83333 0.5377321 19.33333 14.66667 0.1181752874 18 1005.318
median 1 99.0000 77.00000 0.6489467 1.00000 0.00000 0.0000000000 0 854.400
min 1 68.0000 0.00000 0.0000000 0.00000 0.00000 0.0000000000 0 443.440
max 1 208.0000 135.00000 1.0147059 72.00000 75.00000 0.6465517241 86 1733.760
mean 6 76.0000 0.00000 0.0000000 0.00000 0.00000 0.0000000000 0 0.000
median 6 76.0000 0.00000 0.0000000 0.00000 0.00000 0.0000000000 0 0.000
min 6 76.0000 0.00000 0.0000000 0.00000 0.00000 0.0000000000 0 0.000
max 6 76.0000 0.00000 0.0000000 0.00000 0.00000 0.0000000000 0 0.000
mean 7 168.7500 106.25000 0.4782197 7.25000 0.25000 0.0009469697 12 1312.592
median 7 164.5000 92.00000 0.4564394 0.50000 0.00000 0.0000000000 0 1363.930
min 7 82.0000 0.00000 0.0000000 0.00000 0.00000 0.0000000000 0 442.740
max 7 264.0000 241.00000 1.0000000 28.00000 1.00000 0.0037878788 48 2079.770
Upvotes: 2
Reputation: 887118
The bracket for across
)
was closed too early
library(dplyr)
data %>%
replace(is.na(.), 0) %>%
group_by(Site) %>%
dplyr::summarise(across(c(OS_days, ster_days, pct_ster,
first_ster_days, tot_bev_days, pct_bev, first_bev_days, SPD),
list(mean, median, min, max)))
-output
# A tibble: 3 x 33
Site OS_days_1 OS_days_2 OS_days_3 OS_days_4 ster_days_1 ster_days_2 ster_days_3 ster_days_4 pct_ster_1 pct_ster_2 pct_ster_3 pct_ster_4 first_ster_days_1
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 113. 99 68 208 65.8 77 0 135 0.538 0.649 0 1.01 19.3
2 6 76 76 76 76 0 0 0 0 0 0 0 0 0
3 7 169. 164. 82 264 106. 92 0 241 0.478 0.456 0 1 7.25
# … with 19 more variables: first_ster_days_2 <dbl>, first_ster_days_3 <dbl>, first_ster_days_4 <dbl>, tot_bev_days_1 <dbl>, tot_bev_days_2 <dbl>,
# tot_bev_days_3 <dbl>, tot_bev_days_4 <dbl>, pct_bev_1 <dbl>, pct_bev_2 <dbl>, pct_bev_3 <dbl>, pct_bev_4 <dbl>, first_bev_days_1 <dbl>,
# first_bev_days_2 <dbl>, first_bev_days_3 <dbl>, first_bev_days_4 <dbl>, SPD_1 <dbl>, SPD_2 <dbl>, SPD_3 <dbl>, SPD_4 <dbl>
Upvotes: 3