Reputation: 359
I have a transactional database with sales data and user id like the following:
id_usuarioweb dt_fechaventa
1551415 2015-08-01 14:57:21.737
1551415 2015-08-06 15:34:21.920
6958538 2015-07-30 09:26:24.427
6958538 2015-08-05 09:30:06.247
6958538 2015-08-31 17:39:02.027
39101175 2015-08-05 16:34:17.990
39101175 2015-09-20 20:37:26.043
1551415 2015-09-05 13:41:43.767
3673384 2015-09-06 13:34:23.440
And I would like to calculate the average diference between dates by the same customer in the data base (to find average frequency with which the user buys).
I'm aware I can do datediff with two columns, but i'm have issues trying to do it in the same field and "grouping" by user id.
The desired outcome would be like this:
id_usuarioweb avgtime_days
1551415 5
6958538 25
39101175 25
1551415 0
3673384 0
How can I achieve this? I would have the database ordered by user_id and then dt_fechaventa (the sale time).
USING: SQL Server 2008
Upvotes: 6
Views: 4409
Reputation: 3523
I think what you are looking for is calculated like this. Take the maximum and minimum dates, get the difference between them and divide by the number of purchases.
SELECT id_usuarioweb, CASE
WHEN COUNT(*) < 2
THEN 0
ELSE DATEDIFF(dd,
MIN(
dt_fechaventa
), MAX(
dt_fechaventa
)) / (
COUNT(*) -
1
)
END AS avgtime_days
FROM mytable
GROUP BY id_usuarioweb
EDIT: (by @GordonLinoff)
The reason that this is correct is easily seen if you look at the math. Consider three dates, a, b, and c.
The average time between them is:
((b - a) + (c - b)) / 2
This simplifies to:
(c - a) / 2
In other words, the intermediate value cancels out. And, this continues regardless of the number of intermediate values.
Upvotes: 10
Reputation: 49260
You can first number rows using row_number
and then do a self-join
with the cte
. Then perform the average
. However, you would get one row per user, but not as per the expected result.
with x as
(select id_usuarioweb, dt_fechaventa,
row_number() over(partition by id_usuarioweb order by dt_fechaventa) as rn
from tablename)
select x1.id_usuarioweb, avg(datediff(dd,x1.dt_fechaventa,x2.dt_fechaventa)) as avgdiff
from x x1 join x x2
on x1.id_usuarioweb = x2.id_usuarioweb and x1.rn = x2.rn-1
group by x1.id_usuarioweb
Upvotes: 1
Reputation: 70638
This should do:
;WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY id_usuarioweb ORDER BY dt_fechaventa),
N = COUNT(*) OVER(PARTITION BY id_usuarioweb)
FROM dbo.YourTable
)
SELECT A.id_usuarioweb,
AVG(DATEDIFF(DAY,A.dt_fechaventa,B.dt_fechaventa)) avgtime_days
FROM CTE A
INNER JOIN CTE B
ON A.id_usuarioweb = B.id_usuarioweb
AND A.RN = B.RN - 1
WHERE A.N > 1
GROUP BY A.id_usuarioweb;
I'm filtering the users that only have one row there, because you can't calculate an average of days with them.
Here is a demo in sqlfiddle of this. And the results are:
╔═══════════════╦══════════════╗
║ id_usuarioweb ║ avgtime_days ║
╠═══════════════╬══════════════╣
║ 1551415 ║ 17 ║
║ 6958538 ║ 16 ║
║ 39101175 ║ 46 ║
╚═══════════════╩══════════════╝
Upvotes: 2