Reputation: 204
I have a table of transactions with columns id | client_id | datetime
and I have calculated the mean of days between transactions to know how often this transactions are made by each client:
SELECT *, ((date_last_transaction - date_first_transaction)/total_transactions) AS frequency
FROM (
SELECT client_id, COUNT(id) AS total_transactions, MIN(datetime) AS date_first_transaction, MAX(datetime) AS date_last_transaction
FROM transactions
GROUP BY client_id
) AS t;
What would be the existing methods to calculate the standard deviation (in days) in a set of dates with postgresql? Preferably with only one query, if it is posible :-)
Upvotes: 2
Views: 2749
Reputation: 204
I have found this way:
SELECT extract(day from date_trunc('day', (
CASE WHEN COUNT(*) <= 1 THEN
0
ELSE
SUM(time_since_last_invoice)/(COUNT(*)-1)
END
) * '1 day'::interval)) AS days_between_purchases,
extract(day from date_trunc('day', (
CASE WHEN COUNT(*) <= 2 THEN
0
ELSE
STDDEV(time_since_last_invoice)
END
) * '1 day'::interval)) AS range_of_days
FROM (
SELECT client_id, datetime, COALESCE(datetime - lag(datetime)
OVER (PARTITION BY client_id ORDER BY client_id, datetime
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
), 0
) AS time_since_last_invoice
FROM my_table
GROUP BY client_id, datetime
ORDER BY client_id, datetime
)
Explanation:
This query groups by client and date and then calculates the difference between each pair of transaction dates (datetime
) by client_id
and returns a table with these results. After this, the external query processes the table and calculates de average time between differences greater than 0 (first value in each group is excluded because is the first transaction and therefore the interval is 0).
The standard deviation is calculated when there existe 2 o more transaction dates for the same client, to avoid division by zero errors.
All differences are returned in PostgreSQL interval format.
Upvotes: 1