Reputation: 86
Im facing an issue with this task I have. Basically I have price data and along with it, the time and day the item was bought. With this I want to find out how many times price changes during the day for an item. for example, in the morning the price was 7000 but in the evening it could be for 4000. So I want to do this over multiple days for multiple items. Theres also an Order ID associated with the purchase but it need not be unique.
I found this article really helpful and i was able to make some progress but cant get exactly what I need Identify a value changes' date and summarize the data with sum() and diff() in R
I've put the dput output for the data so that it can be recreated. The results should look like this
Item Price_changed_over_all_days Price_changed_in_one_day
x 10 3
y 4 1
z 5 2
Thank you for any advice/help! Please do let me know if I can make the question any clearer.
PS: If possible too, i'd like to be able to tell the highest and lowest price of any given day, if that can be figured out too that'd be great. I know how to do this for a particular day but
structure(list(item = c("x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x",
"x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y", "y",
"y", "y", "y", "y", "y", "y", "y", "z", "z", "z", "z", "z", "z",
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z",
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z",
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z",
"z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z", "z"),
bought_date = structure(c(1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600646400, 1600646400, 1600646400,
1600646400, 1600646400, 1600646400, 1600646400, 1600646400,
1600646400, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800, 1600732800,
1600732800, 1600732800, 1600732800, 1600732800), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), bought_time = structure(c(-2209016101,
-2209014165, -2209006172, -2208996246, -2208992947, -2208991967,
-2209070025, -2209069890, -2209064616, -2209055193, -2209054850,
-2209053617, -2209050638, -2209050426, -2209048499, -2209047983,
-2209047872, -2209047390, -2209046473, -2209045120, -2209044562,
-2209044418, -2209042104, -2209041480, -2209040748, -2209037870,
-2209037696, -2209037309, -2209035846, -2209034872, -2209034429,
-2209034323, -2209030237, -2209028615, -2209028570, -2209028477,
-2209026900, -2209026787, -2209025234, -2209024468, -2209023183,
-2209021020, -2209020175, -2209019934, -2209019733, -2209018417,
-2209016646, -2209016540, -2209015208, -2209014941, -2209011636,
-2209011444, -2209010896, -2209010639, -2209009483, -2209009412,
-2209008912, -2209007424, -2209006197, -2209005462, -2209005439,
-2209005414, -2209004221, -2208998803, -2208998727, -2208993252,
-2208993224, -2208993194, -2208992478, -2208992218, -2209019432,
-2209018785, -2209017271, -2209017188, -2209017177, -2209014531,
-2209014484, -2209014247, -2209013964, -2209012511, -2209009805,
-2209009633, -2209009617, -2209009556, -2209009533, -2209009499,
-2209009474, -2209008099, -2209007958, -2209000389, -2209068522,
-2209062412, -2209062053, -2209058480, -2209058472, -2209058161,
-2209057878, -2209057740, -2209056037, -2209055339, -2209055045,
-2209054472, -2209051624, -2209050659, -2209050339, -2209047529,
-2209045264, -2209038811, -2209038586, -2209038487, -2209038004,
-2209036906, -2209036606, -2209034142, -2209034049, -2209033773,
-2209030890, -2209030794, -2209030626, -2209029600, -2209029464,
-2209027707, -2209026486, -2209024697, -2209021552, -2209021379,
-2209019844, -2209019716, -2209018482, -2209018436, -2209018365,
-2209017376, -2209017340, -2209017319, -2209017054, -2209016900,
-2209016126, -2209014622, -2209013286, -2209012584, -2209009905,
-2209009208, -2209006827, -2209006663, -2208990872, -2209020164,
-2209015899, -2209013965, -2209013933, -2209011963, -2209010443,
-2209010351, -2209008868, -2209007569, -2209063141, -2209063059,
-2209062882, -2209062852, -2209054720, -2209054349, -2209050324,
-2209049810, -2209047902, -2209041612, -2209039205, -2209038444,
-2209038393, -2209038219, -2209037598, -2209037562, -2209037497,
-2209037082, -2209036943, -2209036795, -2209036404, -2209034846,
-2209032324, -2209032289, -2209031999, -2209031958, -2209030309,
-2209029952, -2209023411, -2209022296, -2209021086, -2209020624,
-2209020221, -2209019575, -2209017996, -2209017794, -2209014135,
-2209011509, -2209009303, -2209007905, -2209007799, -2209007709,
-2209005139, -2209004957, -2208998695, -2208998233, -2208990008,
-2208989978), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
ID = c(540273, 540333, 540568, 540734, 540766, 540771, 540808,
540810, 540847, 541011, 541022, 541060, 541147, 541160, 541231,
541252, 541259, 541283, 541317, 541379, 541396, 541399, 541503,
541537, 541562, 541682, 541684, 541704, 541779, 541849, 541879,
541883, 542039, 542115, 542117, 542120, 542164, 542166, 542207,
542236, 542275, 542358, 542394, 542403, 542414, 542457, 542515,
542522, 542579, 542598, 542741, 542749, 542772, 542786, 542825,
542831, 542854, 542934, 542975, 543003, 543004, 543005, 543044,
543109, 543111, 543156, 543158, 543159, 543162, 543164, 540161,
540187, 540230, 540231, 540233, 540322, 540324, 540329, 540344,
540384, 540468, 540477, 540480, 540482, 540483, 540485, 540486,
540522, 540526, 540683, 540820, 540876, 540880, 540917, 540918,
540927, 540934, 540935, 540989, 541005, 541014, 541034, 541114,
541146, 541163, 541276, 541371, 541646, 541653, 541658, 541678,
541725, 541738, 541892, 541895, 541916, 542015, 542021, 542028,
542080, 542084, 542143, 542175, 542225, 542333, 542337, 542409,
542415, 542455, 542456, 542460, 542482, 542485, 542487, 542500,
542505, 542544, 542610, 542677, 542704, 542814, 542837, 542950,
542955, 543174, 540141, 540281, 540343, 540348, 540401, 540453,
540457, 540500, 540535, 540865, 540866, 540869, 540871, 541027,
541038, 541165, 541187, 541257, 541533, 541627, 541661, 541662,
541668, 541691, 541693, 541695, 541713, 541723, 541731, 541751,
541850, 541960, 541963, 541978, 541981, 542035, 542053, 542269,
542301, 542355, 542375, 542390, 542424, 542466, 542471, 542642,
542745, 542835, 542911, 542917, 542920, 543019, 543031, 543112,
543117, 543178, 543179), price = c(7190, 9200, 7170, 7170,
7170, 9170, 7170, 7170, 7170, 9170, 7170, 9170, 8330, 7170,
9170, 7170, 9170, 7170, 7170, 7170, 7170, 7170, 7170, 7170,
7170, 7170, 7170, 9170, 9170, 9170, 9170, 9170, 8330, 7170,
7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170, 9170,
7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170,
7170, 7170, 9170, 7170, 7160, 7160, 7160, 7160, 7160, 7160,
7160, 7160, 8330, 8330, 8330, 8330, 7190, 7190, 7190, 7190,
7190, 7190, 7190, 7190, 9200, 9200, 7190, 7190, 7190, 7190,
7190, 7190, 7190, 9200, 9200, 9170, 9170, 7170, 7170, 9170,
7170, 7170, 9170, 9170, 8330, 9170, 8330, 8330, 9170, 7170,
9170, 7170, 7170, 9170, 9170, 9170, 7170, 7170, 7170, 7170,
7170, 7170, 7170, 7170, 7170, 7170, 7170, 9170, 7170, 9170,
9170, 9170, 7170, 7170, 7170, 9170, 9170, 7170, 7170, 7170,
7170, 7170, 7170, 8330, 7170, 9170, 7170, 9170, 7170, 7170,
9160, 7190, 9200, 7190, 7190, 14880, 9200, 9200, 9200, 7190,
7170, 7170, 7170, 7170, 7170, 7170, 8330, 7170, 9170, 7170,
9170, 9170, 9170, 9170, 7170, 7170, 7170, 9170, 7170, 7170,
7170, 7170, 7170, 7170, 9170, 9170, 9170, 7170, 7170, 7170,
9170, 9170, 9170, 9170, 7170, 7170, 7170, 8330, 7170, 7170,
7170, 7170, 7160, 7160, 9160, 7160, 9160, 9160)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -202L))
Upvotes: 0
Views: 124
Reputation: 41230
With data.table
you could use rleid
:
library(data.table)
setDT(data)
data[,.(times=max(rleid(price))-1),by=.(item)]
# item times
#1: x 24
#2: y 30
#3: z 24
data[,.(timesday=max(rleid(price))-1),by=.(item,bought_date)]
# item bought_date timesday
#1: x 2020-09-21 3
#2: x 2020-09-22 20
#3: y 2020-09-21 4
#4: y 2020-09-22 26
#5: z 2020-09-21 5
#6: z 2020-09-22 18
Upvotes: 3
Reputation: 4949
For starters, I assume your data is in variable df
.
You have to think carefully about what you want to count.
If you want the number of price changes regardless of the days, you can do this:
df %>% group_by(item) %>%
summarise(Price_changed_over_all_days =
sum((lead(price) - price)!=0, na.rm = TRUE))
# A tibble: 3 x 2
# item Price_changed_over_all_days
# <chr> <int>
#1 x 24
#2 y 30
#3 z 24
However, if you want to count the number of price changes in particular days, you will get something like this:
df %>% group_by(item, bought_date) %>%
summarise(Price_changed_in_one_day =
sum((lead(price) - price)!=0, na.rm = TRUE))
# A tibble: 6 x 3
# Groups: item [3]
# item bought_date Price_changed_in_one_day
# <chr> <dttm> <int>
#1 x 2020-09-21 00:00:00 3
#2 x 2020-09-22 00:00:00 20
#3 y 2020-09-21 00:00:00 4
#4 y 2020-09-22 00:00:00 26
#5 z 2020-09-21 00:00:00 5
#6 z 2020-09-22 00:00:00 18
It's just that in this case, you have a lot more rows in the summary table. If you only want one table, you have to somehow assemble it and decide some stats based on the values for the days. Maybe the average will be appropriate here? I do not know that.
df %>% group_by(item) %>%
summarise(Price_changed_over_all_days =
sum((lead(price) - price)!=0, na.rm = TRUE)) %>%
left_join(
df %>% group_by(item, bought_date) %>%
summarise(Price_changed_in_one_day =
sum((lead(price) - price)!=0, na.rm = TRUE)) %>%
group_by(item) %>%
summarise(Price_changed_in_one_day =
mean(Price_changed_in_one_day)
), by= "item")
# A tibble: 3 x 3
# item Price_changed_over_all_days Price_changed_in_one_day
# <chr> <int> <dbl>
#1 x 24 11.5
#2 y 30 15
#3 z 24 11.5
Also note that price changes can occur at the turn of the day and therefore the sum of changes in several days for a given product does not have to equal the sum of all price changes for that product. In your case, this is the case for product "x".
Upvotes: 4