Reputation: 107
I'm trying to calculate the 90th percentile of all station fecal samples by sample date, across columns in a data frame. It would be nice to be able to add this as a new column but not absolutely necessary.
I rearrange my data in the following way, although I don't know if this is necessary. It is easy for me to visualize this way.
library(dplyr)
FecalData <- RawData %>%
select(Station, SampleDate, FecalColiform)
#Rearange by station
library(reshape2)
FecalbyStation <- dcast(FecalData, SampleDate ~ Station, fun.aggregate = mean, na.rm = TRUE)
This leaves me with the following sturcture:
dput(FecalbyStation[1:5,])
structure(list(SampleDate = structure(c(6942, 6979, 7014, 7042,
7070), class = "Date"), `114` = c(114.5, 2, 17, 7.9, 1.8), `115` = c(41,
6.8, 33, 220, 4.5), `116` = c(64, 4, 14, 6.8, 1.8), `117` = c(33,
2, 4.5, 1.8, 2), `118` = c(81.5, 2, 6.8, 33, 1.8), `119` = c(28,
11, 4.5, 1.8, 2), `120` = c(64, 4.5, 11, 1.8, 1.8), `121` = c(31,
4.5, 3.6, 13, 2), `122` = c(41, 2, 33, 13, 1.8), `123` = c(28,
7.8, 2, 13, 1.8), `124` = c(NaN, 7.8, NaN, NaN, NaN), `125` = c(NaN,
NaN, NaN, NaN, NaN), `126` = c(NaN, NaN, NaN, NaN, NaN), `127` = c(NaN,
NaN, NaN, NaN, NaN), `128` = c(NaN, NaN, NaN, NaN, NaN), `129` = c(NaN,
NaN, NaN, NaN, NaN), `614` = c(NaN, NaN, NaN, NaN, NaN), `615` = c(NaN,
NaN, NaN, NaN, NaN), `639` = c(NaN, NaN, NaN, NaN, NaN), `758` = c(NaN,
NaN, NaN, NaN, NaN)), .Names = c("SampleDate", "114", "115",
"116", "117", "118", "119", "120", "121", "122", "123", "124",
"125", "126", "127", "128", "129", "614", "615", "639", "758"
), row.names = c(NA, 5L), class = "data.frame")
I have been able to find row.means() this way and have tweaked this code over and over again to try to get the 90th percentile instead. I have received several different errors along the way. Here is the code I have landed on:
library(psych)
Q90 <- sapply(FecalbyStation, -1, quantile, probs=c(.90), na.rm = TRUE)
This gives me the following error:
Error in match.fun(FUN) : '-1' is not a function, character or symbol
Ultimately I would like to make the resulting 90th percentiles a time series so that I can run a kendall or regression on it to investigate any trend in fecal levels for the region. Any suggestions or advice is much appreciated.
Thank you!
Upvotes: 3
Views: 1995
Reputation: 93811
You can keep your data in long form and get the 90th percentile by date as follows:
library(dplyr)
RawData %>% group_by(SampleDate) %>%
summarise(p90 = quantile(FecalColiform, probs=0.9, na.rm=TRUE))
Upvotes: 2