Reputation: 29074
I would like to do a conditional sum in R and I have a table such as this below. With this data, I would like to have a forward projection of total value per desk for next 5 days. Value should be included for the date started to the out_date.
+-------+------------+-------+-------+------------+------+
| Index | Date | Desk | Value | Out_date | Days |
+-------+------------+-------+-------+------------+------+
| 16 | 2020-07-30 | Desk1 | 1 | 2020-08-17 | 12 |
| 51 | 2020-08-13 | Desk2 | 2.000 | 2020-08-14 | 4 |
| 52 | 2020-08-13 | Desk3 | 2.000 | 2020-08-15 | 4 |
| 53 | 2020-08-13 | Desk3 | 2.000 | 2020-08-16 | 4 |
+-------+------------+-------+-------+------------+------+
How do I solve this?
How the output should like:
+-------+------------+------------+------------+------------+------------+
| Desk | 2020-08-14 | 2020-08-15 | 2020-08-16 | 2020-08-17 | 2020-08-18 |
+-------+------------+------------+------------+------------+------------+
| Desk1 | 1 | 1 | 1 | 1 | 0 |
| Desk2 | 2 | 0 | 0 | 0 | 0 |
| Desk3 | 4 | 4 | 2 | 0 | 0 |
+-------+------------+------------+------------+------------+------------+
Upvotes: 0
Views: 101
Reputation: 173858
From your description, it sounds as though each row in your table represents a Value
associated with a Desk
for a given period of time. The Value
associated with that desk starts on a particular Date
, and continues until the Out_date
. However, these associations can occur concurrently, which means that on any particular day, a desk may have several associated values. Your intention is to sum these values.
If my understanding is correct, then the following code will get you the relevant sums:
library(dplyr)
df %>%
mutate(Days = as.numeric(difftime(Out_date, Date, units = "day")) + 1) %>%
add_row(Index = max(df$Index) + 1, Date = max(df$Date),
Desk = "Desk1", Value = 0, Out_date = max(df$Date) + 1,
Days = 6) %>%
mutate(entry = seq(nrow(.)), n = Days) %>%
tidyr::uncount(Days) %>%
group_by(entry) %>%
mutate(Date_out = seq.Date(min(Date), length.out = max(n), by = "1 day")) %>%
group_by(Desk, Date_out) %>%
summarize(Value = sum(Value)) %>%
tidyr::pivot_wider(names_from = "Date_out", values_from = "Value") %>%
mutate_if(function(x) any(is.na(x)), function(x) replace(x, is.na(x), 0)) %>%
as.data.frame()
#> Desk 2020-07-30 2020-07-31 2020-08-01 2020-08-02 2020-08-03 2020-08-04
#> 1 Desk1 1 1 1 1 1 1
#> 2 Desk2 0 0 0 0 0 0
#> 3 Desk3 0 0 0 0 0 0
#> 2020-08-05 2020-08-06 2020-08-07 2020-08-08 2020-08-09 2020-08-10 2020-08-11
#> 1 1 1 1 1 1 1 1
#> 2 0 0 0 0 0 0 0
#> 3 0 0 0 0 0 0 0
#> 2020-08-12 2020-08-13 2020-08-14 2020-08-15 2020-08-16 2020-08-17 2020-08-18
#> 1 1 1 1 1 1 1 0
#> 2 0 2 2 0 0 0 0
#> 3 0 4 4 4 2 0 0
Data from question
df <- structure(list(Index = c(16L, 51L, 52L, 53L), Date = structure(c(18473,
18487, 18487, 18487), class = "Date"), Desk = c("Desk1", "Desk2",
"Desk3", "Desk3"), Value = c(1, 2, 2, 2), Out_date = structure(c(18491,
18488, 18489, 18490), class = "Date"), Days = c(12L, 4L, 4L,
4L)), row.names = c(NA, -4L), class = "data.frame")
Created on 2020-08-14 by the reprex package (v0.3.0)
Upvotes: 2
Reputation: 436
you can use dplyr and tidyr for this.
input <- tibble::tibble(Desk = c("Desk1",
"Desk2",
"Desk1",
"Desk3"),
Date = c("30.07.20",
"10.08.20",
"10.08.20",
"13.08.20"),
Value = c(0.006,
5.500,
0.300,
2.500))
input %>%
dplyr::group_by(Desk, Date) %>%
dplyr::summarise(sum_value = sum(Value)) %>%
dplyr::ungroup() %>%
tidyr::pivot_wider(names_from = Date, values_from = sum_value)
Upvotes: -1
Reputation: 5747
The dplyr
and tidyr
packages have what you need. Use group_by(Desk, Date)
and summarize(forecast = your_function)
. Then you can pivot_wider()
to get your desired output.
library(dplyr)
library(tidyr)
df %>%
group_by(Desk, Date) %>%
summarize(forecast = your_function) %>%
pivot_wider(names_from = "Date", values_from = "forecast")
Upvotes: 0