lakshmen
lakshmen

Reputation: 29074

Sumif in R with conditions

I would like to do a conditional sum in R and I have a table such as this below. With this data, I would like to have a forward projection of total value per desk for next 5 days. Value should be included for the date started to the out_date.

+-------+------------+-------+-------+------------+------+
| Index |    Date    | Desk  | Value |  Out_date  | Days |
+-------+------------+-------+-------+------------+------+
|    16 | 2020-07-30 | Desk1 | 1     | 2020-08-17 |   12 |
|    51 | 2020-08-13 | Desk2 | 2.000 | 2020-08-14 |    4 |
|    52 | 2020-08-13 | Desk3 | 2.000 | 2020-08-15 |    4 |
|    53 | 2020-08-13 | Desk3 | 2.000 | 2020-08-16 |    4 |
+-------+------------+-------+-------+------------+------+

How do I solve this?

How the output should like:

+-------+------------+------------+------------+------------+------------+
| Desk  | 2020-08-14 | 2020-08-15 | 2020-08-16 | 2020-08-17 | 2020-08-18 |
+-------+------------+------------+------------+------------+------------+
| Desk1 |          1 |          1 |      1     |      1     |       0    |
| Desk2 |          2 |          0 |      0     |      0     |       0    |
| Desk3 |          4 |          4 |      2     |      0     |       0    |
+-------+------------+------------+------------+------------+------------+

Upvotes: 0

Views: 101

Answers (3)

Allan Cameron
Allan Cameron

Reputation: 173858

From your description, it sounds as though each row in your table represents a Value associated with a Desk for a given period of time. The Value associated with that desk starts on a particular Date, and continues until the Out_date. However, these associations can occur concurrently, which means that on any particular day, a desk may have several associated values. Your intention is to sum these values.

If my understanding is correct, then the following code will get you the relevant sums:

library(dplyr)

df %>% 
  mutate(Days = as.numeric(difftime(Out_date, Date, units = "day")) + 1) %>%
  add_row(Index = max(df$Index) + 1, Date = max(df$Date), 
          Desk = "Desk1", Value = 0, Out_date = max(df$Date) + 1, 
          Days = 6) %>%
  mutate(entry = seq(nrow(.)), n = Days) %>% 
  tidyr::uncount(Days) %>%
  group_by(entry) %>%
  mutate(Date_out = seq.Date(min(Date), length.out = max(n), by = "1 day")) %>%
  group_by(Desk, Date_out) %>%
  summarize(Value = sum(Value)) %>%
  tidyr::pivot_wider(names_from = "Date_out", values_from = "Value") %>%
  mutate_if(function(x) any(is.na(x)), function(x) replace(x, is.na(x), 0)) %>%
  as.data.frame()

#>    Desk 2020-07-30 2020-07-31 2020-08-01 2020-08-02 2020-08-03 2020-08-04
#> 1 Desk1          1          1          1          1          1          1
#> 2 Desk2          0          0          0          0          0          0
#> 3 Desk3          0          0          0          0          0          0
#>   2020-08-05 2020-08-06 2020-08-07 2020-08-08 2020-08-09 2020-08-10 2020-08-11
#> 1          1          1          1          1          1          1          1
#> 2          0          0          0          0          0          0          0
#> 3          0          0          0          0          0          0          0
#>   2020-08-12 2020-08-13 2020-08-14 2020-08-15 2020-08-16 2020-08-17 2020-08-18
#> 1          1          1          1          1          1          1          0
#> 2          0          2          2          0          0          0          0
#> 3          0          4          4          4          2          0          0

Data from question

df <- structure(list(Index = c(16L, 51L, 52L, 53L), Date = structure(c(18473, 
18487, 18487, 18487), class = "Date"), Desk = c("Desk1", "Desk2", 
"Desk3", "Desk3"), Value = c(1, 2, 2, 2), Out_date = structure(c(18491, 
18488, 18489, 18490), class = "Date"), Days = c(12L, 4L, 4L, 
4L)), row.names = c(NA, -4L), class = "data.frame")

Created on 2020-08-14 by the reprex package (v0.3.0)

Upvotes: 2

janderkran
janderkran

Reputation: 436

you can use dplyr and tidyr for this.


input <- tibble::tibble(Desk = c("Desk1",
                                 "Desk2",
                                 "Desk1",
                                 "Desk3"), 
                        Date = c("30.07.20", 
                                 "10.08.20",
                                 "10.08.20",
                                 "13.08.20"), 
                        Value = c(0.006,
                                  5.500,
                                  0.300,
                                  2.500))

input %>%
  dplyr::group_by(Desk, Date) %>%
  dplyr::summarise(sum_value = sum(Value)) %>%
  dplyr::ungroup() %>%
  tidyr::pivot_wider(names_from = Date, values_from = sum_value)

Upvotes: -1

Ben Norris
Ben Norris

Reputation: 5747

The dplyr and tidyr packages have what you need. Use group_by(Desk, Date) and summarize(forecast = your_function). Then you can pivot_wider() to get your desired output.

library(dplyr)
library(tidyr)
df %>%
  group_by(Desk, Date) %>%
  summarize(forecast = your_function) %>%
  pivot_wider(names_from = "Date", values_from = "forecast")

Upvotes: 0

Related Questions