Reputation: 17
I have a Julia DataFrame with one column being dates and another column being sales data. At each date entry, I want the sum of the total sales in the last 360 days exactly.
Haven’t tried anything successfully so far… Any help much appreciated
Upvotes: -1
Views: 112
Reputation: 18217
As mentioned in the comment, the question could have included more code to make the problem more obvious and easier to work on. Having said that, here is an example:
using Dates
using DataFrames
using FlexiJoins
using IntervalSets
using Random
Random.seed!(111)
df = sort!(DataFrame(date=rand(Date(2020,1,1):Date(2022,1,1),100),
customer=rand(["A","B"],100),
amount=rand((1:10_000)./100,100)),[:date])
combine(
groupby(
leftjoin((df, df), by_key(:customer) & by_pred(x->x.date-Day(360)..x.date, ∋, :date)),
[:date, :customer, :amount]),
:amount_1 => sum => :last360)
The data is generated in df
and then the rolling sum is calculated with the combine
statement. Various tweaks might be necessary (does 360 days include the same day or not? Is it 360 days or one calendar year?).
P.S. there may be more ways to achieve this goal, and other answers may come up now. For example, there is a package called InMemoryDatasets which might have an easier solution. Also, using only FlexiJoins without DataFrames will probably be faster (see FlexiJoins documentation).
Upvotes: 1