Reputation: 15
I have a data.frame made up of daily temperatures per julian day for a number of stations.
Minimal reproducible example data.frame:
TemperatureData <- data.frame(
Julian_Day = 1:365,
Station_1 = c(rnorm(1:365, mean=10, sd=2)),
Station_2 = c(rnorm(1:365, mean=10, sd=2)),
Station_3 = c(rnorm(1:365, mean=10, sd=2))
)
I would like to determine the julian day on which each station exceeds a stated percentage of the total cumulative value, and for an output stating the julian day when this cumulative value threshold is reached for each station.
For example, say Station 1 has a total value of 4000, and after 180 julian days the cumulative value exceeds a set 50% threshold of the total value, and is repeated for each column of the data.frame (example of preferred output below).
Station_1 Station_2 Station_3
180 183 179
I assume this would utilise the cumsum function in some capacity, but not sure how to implement it. Can anyone help with this?
Let me know if this doesn't make sense.
Upvotes: 1
Views: 855
Reputation: 17868
Base R solution:
TemperatureData <- data.frame(
Julian_Day = 1:365,
Station_1 = c(rnorm(1:365, mean=10, sd=2)),
Station_2 = c(rnorm(1:365, mean=10, sd=2)),
Station_3 = c(rnorm(1:365, mean=10, sd=2))
)
TemperatureData$Station_1 <- cumsum(TemperatureData$Station_1) / sum(TemperatureData$Station_1)
TemperatureData$Station_2 <- cumsum(TemperatureData$Station_2) / sum(TemperatureData$Station_2)
TemperatureData$Station_3 <- cumsum(TemperatureData$Station_3) / sum(TemperatureData$Station_3)
results <- c(
"Station 1" = TemperatureData$Julian_Day[TemperatureData$Station_1 >= .5][1],
"Station 2" = TemperatureData$Julian_Day[TemperatureData$Station_2 >= .5][1],
"Station 3" = TemperatureData$Julian_Day[TemperatureData$Station_3 >= .5][1]
)
results
#> Station 1 Station 2 Station 3
#> 180 185 183
tidyverse
solution:
library(dplyr)
TemperatureData %>%
summarize(across(matches("Station"),
function(x) Julian_Day[cumsum(x) / sum(x) > .5][1]))
data.table
solution:
library(data.table)
setDT(TemperatureData)
TemperatureData[, lapply(.SD, function(x) Julian_Day[cumsum(x) / sum(x) > .5][1]),
.SDcols=patterns("Station")]
Upvotes: 3
Reputation: 5747
Here is a tidyverse
way to do it. I figure there is a simpler method, and if I figure it out, I will post it.
library(dplyr)
library(tidyr)
TemperatureData %>%
pivot_longer(cols = -Julian_Day, names_to = "Station") %>%
group_by(Station) %>%
arrange(Station, Julian_Day) %>%
mutate(cumpct = cumsum(value) / sum(value)) %>%
filter(cumpct >= 0.5) %>%
slice(1) %>%
pivot_wider(id_cols = 1, names_from = Station, values_from = Julian_Day)
# A tibble: 1 x 3
Station_1 Station_2 Station_3
<int> <int> <int>
1 184 181 181
Upvotes: 1