Reputation: 5
I'm working with a large dataframe 7191 obs. of 19 variables. The columns are Month, Day, Year, and Site1 through Site16. Where Month is either June, July, August, September, or October.
Here is the beginning of my data, which I believe has only numerals in the columns site1-site16. Currently, I'm double checking to make sure.
dput(head(No_PS_for_Calculations ))
structure(list(Month = c("June", "June", "June", "June", "June",
"June"), Day = c(1, 2, 3, 4, 5, 6), Year = c(1970, 1970, 1970,
1970, 1970, 1970), Site1 = c("11.531", "12.298", "12.732", "12.619",
"12.5", "13.201"), Site2 = c("11.185", "11.439", "12.17", "12.432",
"12.337", "12.492"), Site3 = c("11.147", "11.496", "11.645",
"12.208", "12.644", "12.971"), Site4 = c("11.393", "11.707",
"11.961", "12.135", "12.809", "13.041"), Site5 = c("11.797",
"11.925", "12.34", "12.525", "13.01", "13.548"), Site6 = c("11.853",
"11.974", "12.16", "12.481", "12.459", "12.838"), Site7 = c("12.319",
"12.46", "12.476", "12.729", "13.026", "13.032"), Site8 = c("12.557",
"12.643", "12.789", "12.975", "13.202", "13.339"), Site9 = c("12.774",
"13.337", "13.896", "13.897", "13.819", "14.054"), Site10 = c("12.819",
"13.202", "13.783", "14.298", "14.284", "14.309"), Site11 = c("13.151",
"13.556", "13.833", "14.08", "14.244", "14.841"), Site12 = c("13.61",
"13.57", "14.111", "14.073", "14.331", "14.849"), Site13 = c("13.802",
"13.872", "14.244", "14.249", "14.255", "14.818"), Site14 = c("14.138",
"14.275", "14.332", "14.522", "14.244", "14.927"), Site15 = c("14.138",
"14.616", "14.766", "14.697", "14.61", "14.694"), Site16 = c("14.208",
"14.627", "14.928", "14.829", "14.69", "14.762")), row.names = 151:156, class = "data.frame")
For my analysis I am interested in finding the 90th percentile for each month in each year. For example for 1970, I need the 90th percentile for June, July, August, September, and October. I've tried a few different ways but keep getting stuck in the same spot so I thought I'd ask for help.
result <- No_PS_for_Calculations %>%
group_by(Year, Month) %>%
summarise(across(Site1:`Site16`, quantile, probs = .9, .names = 'percent90_{col}'))
data.frame(result)
Which results in the following error:
Error: Problem with `summarise()` input `..1`.
i `..1 = across(Site1:Site16, quantile, probs = 0.9, .names = "percent90_{col}")`.
x non-numeric argument to binary operator
i The error occurred in group 1: Year = 1970, Month = "August".
I've been able to find the percentile grouped by month but now need to include year for further analysis.
What is the best way to get the 90th Percentiles presented by year and then month?
Thanks for the help!
Upvotes: 0
Views: 110
Reputation: 160437
It seems likely that you have something non-numeric in a column between Site1
and Site16
. Some fake data:
set.seed(42)
No_PS_for_Calculations <- data.frame(Year = rep(2020:2021, each = 3), Month = rep(c("Aug","Sep","Oct"), times = 2), Site1 = runif(6), Quux = sprintf("%0.03f", runif(6)), Site16 = runif(6))
No_PS_for_Calculations
# Year Month Site1 Quux Site16
# 1 2020 Aug 0.9148060 0.737 0.9346722
# 2 2020 Sep 0.9370754 0.135 0.2554288
# 3 2020 Oct 0.2861395 0.657 0.4622928
# 4 2021 Aug 0.8304476 0.705 0.9400145
# 5 2021 Sep 0.6417455 0.458 0.9782264
# 6 2021 Oct 0.5190959 0.719 0.1174874
No_PS_for_Calculations %>%
group_by(Year, Month) %>%
summarise(across(Site1:`Site16`, quantile, probs = .9, .names = 'percent90_{col}'))
+ > Error: Problem with `summarise()` input `..1`.
# x non-numeric argument to binary operator
# i Input `..1` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
# i The error occurred in group 1: Year = 2020, Month = "Aug".
If the non-numeric data ("Quux"
column here) is not meant to be summarized, then you can select the columns you need to avoid any confusion:
No_PS_for_Calculations %>%
select(Year, Month, starts_with("Site")) %>%
group_by(Year, Month) %>%
summarise(across(Site1:`Site16`, quantile, probs = .9, .names = 'percent90_{col}'))
# # A tibble: 6 x 4
# # Groups: Year [2]
# Year Month percent90_Site1 percent90_Site16
# <int> <chr> <dbl> <dbl>
# 1 2020 Aug 0.915 0.737
# 2 2020 Oct 0.286 0.657
# 3 2020 Sep 0.937 0.135
# 4 2021 Aug 0.830 0.705
# 5 2021 Oct 0.519 0.719
# 6 2021 Sep 0.642 0.458
Another cause might be if a legitimate Site
column is non-numeric, in which case you need to determine if you can easily convert to numeric. For instance, if "Quux"
here is instead named "Site2"
names(No_PS_for_Calculations)[4] <- "Site2"
then we can try to convert it inline:
No_PS_for_Calculations %>%
mutate(Site2 = as.numeric(Site2)) %>%
group_by(Year, Month) %>%
summarise(across(Site1:`Site16`, quantile, probs = .9, .names = 'percent90_{col}'))
# # A tibble: 6 x 5
# # Groups: Year [2]
# Year Month percent90_Site1 percent90_Site2 percent90_Site16
# <int> <chr> <dbl> <dbl> <dbl>
# 1 2020 Aug 0.915 0.737 0.935
# 2 2020 Oct 0.286 0.657 0.462
# 3 2020 Sep 0.937 0.135 0.255
# 4 2021 Aug 0.830 0.705 0.940
# 5 2021 Oct 0.519 0.719 0.117
# 6 2021 Sep 0.642 0.458 0.978
Of course, if there are non-number characters in there, you will get NA
s, which is easily fixed given filters, cleaners, or similar.
Upvotes: 1