Zizou
Zizou

Reputation: 503

Subtracting values between columns in R

I need to create a new column by adding value from the stock column and subtracting the value from the sales column. When the result is negative, the next stock should be rewritten to the results as in the example. With the help of ThomasIsCoding the created code works well for sample data. But when I test the program for many products, there are erroneous calculations. Which shows below. Can you check why this is happening?

Code:

TD<-data.frame(product = rep("A",9), data = seq(as.Date("2020-01-01"), as.Date("2020-01-09"), by = "day"),
               sale = c(0, 5, 0, 0, 15, 1, 2, 1, 0), stock = c(0, 0, 50, 0,  10, 0, 10, 0, 100))
TD <- within(TD, result <- ave(stock-sale, product,
                              ave(stock-sale,
                               cumsum(stock>sale),FUN = cumsum)>0,FUN = cumsum))

> TD#works well
  product       data sale stock   result
1       A 2020-01-01    0     0        0
2       A 2020-01-02    5     0       -5
3       A 2020-01-03    0    50       50
4       A 2020-01-04    0     0       50
5       A 2020-01-05   15    10       45
6       A 2020-01-06    1     0       44
7       A 2020-01-07    2    10       52
8       A 2020-01-08    1     0       51
9       A 2020-01-09    0   100      151

Examples when it doesn't work: I don't understand why the program sometimes works well and sometimes it doesn't.

> TD[1330:1340,]#doesn't work
# A tibble: 4 x 5
# Groups:  data [4]
   product data       stock  sale   result      expected
   <chr>   <fct>      <dbl> <dbl>    <dbl>
 1 B       2020-02-25     0     0      -21           -21
 2 B       2020-02-26     0     0      -21           -21
 3 B       2020-02-27    60     4       35            56
 4 B       2020-02-28     0     2       33            54

> TD[7293:7297,]#works ok
# A tibble: 5 x 5
# Groups:   data [5]
   product  data       stock  sale   result      
   <chr>    <fct>      <dbl> <dbl>    <dbl>
 1 D        2020-01-13     0     2      -22
 2 D        2020-01-14     0     2      -24
 3 D        2020-01-15     0     0      -24
 4 D        2020-01-16   100     2       98
 5 D        2020-01-17     0     4       94


Upvotes: 0

Views: 97

Answers (1)

dc37
dc37

Reputation: 16178

OK, maybe this is a solution, definitely not the most elegant or readable but at least it passes all your "test" dataframes:

library(dplyr)
df %>% group_by(product) %>% mutate(New = stock - sale) %>%
  mutate(New2 = ifelse(lag((stock-sale) <0), stock, stock-sale)) %>%
  mutate(New3 = ifelse(is.na(New2),New, New2)) %>%
  mutate(New4 = ifelse(lag(New3 >= 0), lag(New3)+stock-sale,New3)) %>%
  mutate(New4 = ifelse(is.na(New4),stock-sale,New4)) %>%
  select(product, sale, stock, Result = New4)

So, now testing a dataframe with multiple groups:

TD2 = data.frame(product = c("AA","AA","AA","AA","AZ","AZ","AZ","AF","AF","AF","BA","BA","BA"), 
                 data = seq(as.Date("2020-01-01"), as.Date("2020-01-13"), by = "day"),
                 sale = c(0, 5, 0, 0, 15, 1, 2, 1, 0,10,0,1,10), 
                 stock = c(0, 0, 50, 0,  10, 0, 10, 0, 100,0,0,100,0))

TD2 %>% group_by(product) %>% mutate(New = stock - sale) %>%
  mutate(New2 = ifelse(lag((stock-sale) <0), stock, stock-sale)) %>%
  mutate(New3 = ifelse(is.na(New2),New, New2)) %>%
  mutate(New4 = ifelse(lag(New3 >= 0), lag(New3)+stock-sale,New3)) %>%
  mutate(New4 = ifelse(is.na(New4),stock-sale,New4)) %>%
  select(product, sale, stock, Result = New4)

# A tibble: 13 x 4
# Groups:   product [4]
   product  sale stock Result
   <fct>   <dbl> <dbl>  <dbl>
 1 AA          0     0      0
 2 AA          5     0     -5
 3 AA          0    50     50
 4 AA          0     0     50
 5 AZ         15    10     -5
 6 AZ          1     0      0
 7 AZ          2    10      8
 8 AF          1     0     -1
 9 AF          0   100    100
10 AF         10     0     90
11 BA          0     0      0
12 BA          1   100     99
13 BA         10     0     89

Now testing the second example of your question (I assumed that -21 in the expected results is an error and was in fact inherited from previous calculations and so I decided to ignore this results and focused only on the last part because getting -21 at this step does not make any sense based on your data):

Test2 = data.frame(product = c("AA","AA","AA","AA"), 
                 sale = c(0, 0,4,2), 
                 stock = c(0, 0, 60, 0))

Test2 %>% group_by(product) %>% mutate(New = stock - sale) %>%
  mutate(New2 = ifelse(lag((stock-sale) <0), stock, stock-sale)) %>%
  mutate(New3 = ifelse(is.na(New2),New, New2)) %>%
  mutate(New4 = ifelse(lag(New3 >= 0), lag(New3)+stock-sale,New3)) %>%
  mutate(New4 = ifelse(is.na(New4),stock-sale,New4)) %>%
  select(product, sale, stock, Result = New4)

# A tibble: 4 x 4
# Groups:   product [1]
  product  sale stock Result
  <fct>   <dbl> <dbl>  <dbl>
1 AA          0     0      0
2 AA          0     0      0
3 AA          4    60     56
4 AA          2     0     54

And finally the last example of your question:

Test3 = data.frame(product = c("AA","AA","AA","AA","AA"), 
                   sale = c(2,2,0,2,4), 
                   stock = c(0, 0, 0,100, 0))

Test3 %>% group_by(product) %>% mutate(New = stock - sale) %>%
    mutate(New2 = ifelse(lag((stock-sale) <0), stock, stock-sale)) %>%
    mutate(New3 = ifelse(is.na(New2),New, New2)) %>%
    mutate(New4 = ifelse(lag(New3 >= 0), lag(New3)+stock-sale,New3)) %>%
    mutate(New4 = ifelse(is.na(New4),stock-sale,New4)) %>%
    select(product, sale, stock, Result = New4)

# A tibble: 5 x 4
# Groups:   product [1]
  product  sale stock Result
  <fct>   <dbl> <dbl>  <dbl>
1 AA          2     0     -2
2 AA          2     0      0
3 AA          0     0      0
4 AA          2   100     98
5 AA          4     0     94

Hope that it will be satisfying for you. Otherwise, as mentioned in comments by @RonakShah, I invited you to re-consider your approach to clearly defined what you need, how to calculate it and what is the rationale of your output.

Upvotes: 1

Related Questions