Reputation: 97
WEEK PRICE QUANTITY SALE_PRICE TYPE
1 4992 5.99 2847.50 0.00 3
2 4995 3.33 36759.00 3.33 3
3 4996 5.99 2517.00 0.00 3
4 4997 5.49 2858.50 0.00 3
5 5001 3.33 32425.00 3.33 3
6 5002 5.49 4205.50 0.00 3
7 5004 5.99 4329.50 0.00 3
8 5006 2.74 55811.00 2.74 3
9 5007 5.49 4133.00 0.00 3
10 5008 5.99 4074.00 0.00 3
11 5009 3.99 12125.25 3.99 3
12 5017 2.74 77645.00 2.74 3
13 5018 5.49 5315.50 0.00 3
14 5020 2.74 78699.00 2.74 3
15 5021 5.49 5158.50 0.00 3
16 5023 5.99 5315.00 0.00 3
17 5024 5.49 6545.00 0.00 3
18 5025 3.33 63418.00 3.33 3
If there are consecutive 0 sale price entries then I want to keep last entry with sale price 0. Like I want to remove week 4996 and want to keep week 4997, I want week 5004 and I want to remove 5002. Similarly I want to delete 5021 & 5023 and want to keep week 5024.
Upvotes: 0
Views: 1854
Reputation: 887911
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
). create a grouping variable with rleid
based on a logical vector of the presence of 0 in 'SALE_PRICE' (!SALE_PRICE
). Using the 'grp' as grouping variable, we get the last row of 'Subset of Data.table (.SD[.N]
) if
the 'SALE_PRICEelements are
all0 or
elseget the
.SD` i.e. the full rows for a particular group.
library(data.table)
setDT(df1)[, grp:= rleid(!SALE_PRICE)
][,if(all(!SALE_PRICE)) .SD[.N] else .SD , grp
][, grp := NULL][]
# WEEK PRICE QUANTITY SALE_PRICE TYPE
# 1: 4992 5.99 2847.50 0.00 3
# 2: 4995 3.33 36759.00 3.33 3
# 3: 4997 5.49 2858.50 0.00 3
# 4: 5001 3.33 32425.00 3.33 3
# 5: 5004 5.99 4329.50 0.00 3
# 6: 5006 2.74 55811.00 2.74 3
# 7: 5008 5.99 4074.00 0.00 3
# 8: 5009 3.99 12125.25 3.99 3
# 9: 5017 2.74 77645.00 2.74 3
#10: 5018 5.49 5315.50 0.00 3
#11: 5020 2.74 78699.00 2.74 3
#12: 5024 5.49 6545.00 0.00 3
#13: 5025 3.33 63418.00 3.33 3
Or an option using dplyr
by creating a grouping variable with diff
and cumsum
, then filter
the rows to keep only the last row of 'SALE_PRICE' that are 0 or (|
) select the rows where 'SALE_PRICE' is not 0.
library(dplyr)
df1 %>%
group_by(grp = cumsum(c(TRUE,diff(!SALE_PRICE)!=0))) %>%
filter( !duplicated(!SALE_PRICE, fromLast=TRUE)|SALE_PRICE!=0) %>%
select(-grp)
# grp WEEK PRICE QUANTITY SALE_PRICE TYPE
# (int) (int) (dbl) (dbl) (dbl) (int)
#1 1 4992 5.99 2847.50 0.00 3
#2 2 4995 3.33 36759.00 3.33 3
#3 3 4997 5.49 2858.50 0.00 3
#4 4 5001 3.33 32425.00 3.33 3
#5 5 5004 5.99 4329.50 0.00 3
#6 6 5006 2.74 55811.00 2.74 3
#7 7 5008 5.99 4074.00 0.00 3
#8 8 5009 3.99 12125.25 3.99 3
#9 8 5017 2.74 77645.00 2.74 3
#10 9 5018 5.49 5315.50 0.00 3
#11 10 5020 2.74 78699.00 2.74 3
#12 11 5024 5.49 6545.00 0.00 3
#13 12 5025 3.33 63418.00 3.33 3
Upvotes: 1