Reputation: 23
I am trying to create a cumulative moving average in SQLite.
As a refresher, in a cumulative moving average (CMA), the data arrives in an ordered datum stream, and I would like to get the average of all of the data up until the current datum point.
My Table looks like:
Continent,Date,Measure,Value
Antarctica,03/01/2019 12:00:00 AM,Passengers,346158
South America,03/01/2019 12:00:00 AM,Ships,6483
South America,03/01/2019 12:00:00 AM,Flights,19
Antarctica,02/01/2019 12:00:00 AM,Passengers,172163
South America,02/01/2019 12:00:00 AM,Cargo Ships,1319
Antarctica,01/01/2019 12:00:00 AM,Passengers,56810
Previous solutions such as 1 or 2 describe a moving average monthly or weekly. However, while I can hold this average month to month, I am trying to build a cumulative average.
I tried doing this:
SELECT T1.Date, AVG(T2.VALUE) from my_table AS T1 INNER JOIN my_table AS T2 ON datetime(T1.Date, '-1 Month') <= datetime(T2.Date)
AND datetime(T1.Date, '+1 Month') >= datetime(T2.Date) GROUP BY
T1.date;
But when I use sqlite
, the datetime operation produces an error: sqlite does not have operation datetime.
I even tried the simple command: SELECT AVG(VALUE) FROM my_table GROUP BY MEASURE, DATE, CONTINENT
, but that grouped by the moving average, which did not solve my issue.
What I am looking to do:
Continent,Date,Measure,Value,Average
Antarctica,03/01/2019 12:00:00 AM,Passengers,346158,114487
South America,03/01/2019 12:00:00 AM,Ships,6483,0
South America,03/01/2019 12:00:00 AM,Flights,19,0
Antarctica,02/01/2019 12:00:00 AM,Passengers,172163,56810
South America,02/01/2019 12:00:00 AM,Cargo Ships,1319,0
Antarctica,01/01/2019 12:00:00 AM,Passengers,56810,0
The column, Average
, is for the running monthly average of total crossings to the Continent and means of crossing in all previous months. So, to calculate the Average for the first line (i.e., running monthly average of total Passengers crossing Antarctica in all of the months preceding March), you'd take the average sum of total number of Antarctica passenger crossings in February 156,891 + 15,272 = 172,163
and January 56,810
, and round it to the nearest whole number round(228,973/2) = 114,487
.
Is there an easier to way to go about this?
Upvotes: 1
Views: 313
Reputation: 52579
First, fix your timestamps so they are in a format that can be sorted, like the ISO-8601 ones that sqlite date and time functions support. Instead of 03/01/2019 12:00:00 AM
, use 2019-03-01 00:00:00
(Or just 2019-03-01
if you don't care about the time, just the date). This leaves your CSV data looking like:
Continent,Date,Measure,Value
Antarctica,2019-03-01 00:00:00,Passengers,346158
South America,2019-03-01 00:00:00,Ships,6483
South America,2019-03-01 00:00:00,Flights,19
Antarctica,2019-02-01 00:00:00,Passengers,172163
South America,2019-02-01 00:00:00,Cargo Ships,1319
Antarctica,2019-01-01 00:00:00,Passengers,56810
Then you can use a window function (Introduced in Sqlite 3.25) to easily calculate the cumulative average of previous months:
SELECT continent, date, measure, value,
cast(round(ifnull(avg(value)
OVER (PARTITION BY continent, measure
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
0),
0) AS INTEGER) AS Average
FROM crossings
ORDER BY date DESC, continent, measure DESC;
which gives
Continent Date Measure Value Average
---------- ------------------- ---------- ---------- ----------
Antarctica 2019-03-01 00:00:00 Passengers 346158 114487
South Amer 2019-03-01 00:00:00 Ships 6483 0
South Amer 2019-03-01 00:00:00 Flights 19 0
Antarctica 2019-02-01 00:00:00 Passengers 172163 56810
South Amer 2019-02-01 00:00:00 Cargo Ship 1319 0
Antarctica 2019-01-01 00:00:00 Passengers 56810 0
If stuck on an older version without window function support, you can use a correlated subquery to calculate the cumulative average:
SELECT continent, date, measure, value,
ifnull((SELECT cast(round(avg(c2.value), 0) AS INTEGER)
FROM crossings AS c2
WHERE c2.continent = c.continent
AND c2.measure = c.measure
AND c2.date < c.date),
0) AS Average
FROM crossings AS c
ORDER BY date DESC, continent, measure DESC;
Both versions will benefit from an index on (continent, measure, date)
.
Upvotes: 2