Error : missing value where TRUE/FALSE needed

Question

WEEK PRICE QUANTITY SALE_PRICE TYPE
1  4992  5.99  2847.50       0.00    3
2  4995  3.33 36759.00       3.33    3
3  4996  5.99  2517.00       0.00    3
4  4997  5.49  2858.50       0.00    3
5  5001  3.33 32425.00       3.33    3
6  5002  5.49  4205.50       0.00    3
7  5004  5.99  4329.50       0.00    3
8  5006  2.74 55811.00       2.74    3
9  5007  5.49  4133.00       0.00    3
10 5008  5.99  4074.00       0.00    3
11 5009  3.99 12125.25       3.99    3
12 5017  2.74 77645.00       2.74    3
13 5018  5.49  5315.50       0.00    3
14 5020  2.74 78699.00       2.74    3
15 5021  5.49  5158.50       0.00    3
16 5023  5.99  5315.00       0.00    3
17 5024  5.49  6545.00       0.00    3
18 5025  3.33 63418.00       3.33    3

If there are consecutive 0 sale price entries then I want to keep last entry with sale price 0. Like I want to remove week 4996 and want to keep week 4997, I want week 5004 and I want to remove 5002. Similarly I want to delete 5021 & 5023 and want to keep week 5024.

akrun · Accepted Answer

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)). create a grouping variable with rleid based on a logical vector of the presence of 0 in 'SALE_PRICE' (!SALE_PRICE). Using the 'grp' as grouping variable, we get the last row of 'Subset of Data.table (.SD[.N]) if the 'SALE_PRICEelements areall0 orelseget the.SD` i.e. the full rows for a particular group.

library(data.table)
setDT(df1)[, grp:= rleid(!SALE_PRICE)
           ][,if(all(!SALE_PRICE)) .SD[.N] else .SD  , grp
            ][, grp := NULL][]
#     WEEK PRICE QUANTITY SALE_PRICE TYPE
# 1: 4992  5.99  2847.50       0.00    3
# 2: 4995  3.33 36759.00       3.33    3
# 3: 4997  5.49  2858.50       0.00    3
# 4: 5001  3.33 32425.00       3.33    3
# 5: 5004  5.99  4329.50       0.00    3
# 6: 5006  2.74 55811.00       2.74    3
# 7: 5008  5.99  4074.00       0.00    3
# 8: 5009  3.99 12125.25       3.99    3
# 9: 5017  2.74 77645.00       2.74    3
#10: 5018  5.49  5315.50       0.00    3
#11: 5020  2.74 78699.00       2.74    3
#12: 5024  5.49  6545.00       0.00    3
#13: 5025  3.33 63418.00       3.33    3

Or an option using dplyr by creating a grouping variable with diff and cumsum, then filter the rows to keep only the last row of 'SALE_PRICE' that are 0 or (|) select the rows where 'SALE_PRICE' is not 0.

library(dplyr)
df1 %>%
  group_by(grp = cumsum(c(TRUE,diff(!SALE_PRICE)!=0))) %>% 
  filter( !duplicated(!SALE_PRICE, fromLast=TRUE)|SALE_PRICE!=0) %>%
  select(-grp)
#     grp  WEEK PRICE QUANTITY SALE_PRICE  TYPE
#    (int) (int) (dbl)    (dbl)      (dbl) (int)
#1      1  4992  5.99  2847.50       0.00     3
#2      2  4995  3.33 36759.00       3.33     3
#3      3  4997  5.49  2858.50       0.00     3
#4      4  5001  3.33 32425.00       3.33     3
#5      5  5004  5.99  4329.50       0.00     3
#6      6  5006  2.74 55811.00       2.74     3
#7      7  5008  5.99  4074.00       0.00     3
#8      8  5009  3.99 12125.25       3.99     3
#9      8  5017  2.74 77645.00       2.74     3
#10     9  5018  5.49  5315.50       0.00     3
#11    10  5020  2.74 78699.00       2.74     3
#12    11  5024  5.49  6545.00       0.00     3
#13    12  5025  3.33 63418.00       3.33     3

Error : missing value where TRUE/FALSE needed

Answers (1)

Related Questions