eflores89
eflores89

Reputation: 359

Average time between dates in same field by groups

I have a transactional database with sales data and user id like the following:

id_usuarioweb   dt_fechaventa
1551415         2015-08-01 14:57:21.737
1551415         2015-08-06 15:34:21.920
6958538         2015-07-30 09:26:24.427
6958538         2015-08-05 09:30:06.247
6958538         2015-08-31 17:39:02.027
39101175        2015-08-05 16:34:17.990
39101175        2015-09-20 20:37:26.043
1551415         2015-09-05 13:41:43.767
3673384         2015-09-06 13:34:23.440

And I would like to calculate the average diference between dates by the same customer in the data base (to find average frequency with which the user buys).

I'm aware I can do datediff with two columns, but i'm have issues trying to do it in the same field and "grouping" by user id.

The desired outcome would be like this:

id_usuarioweb   avgtime_days
1551415         5
6958538         25
39101175        25
1551415         0
3673384         0

How can I achieve this? I would have the database ordered by user_id and then dt_fechaventa (the sale time).

USING: SQL Server 2008

Upvotes: 6

Views: 4409

Answers (3)

Greg Viers
Greg Viers

Reputation: 3523

I think what you are looking for is calculated like this. Take the maximum and minimum dates, get the difference between them and divide by the number of purchases.

SELECT id_usuarioweb, CASE 
        WHEN COUNT(*) < 2
            THEN 0
        ELSE DATEDIFF(dd, 
                MIN(
                    dt_fechaventa
                ), MAX(
                    dt_fechaventa
                )) / (
                COUNT(*) - 
                1
                )
        END AS avgtime_days
FROM mytable
GROUP BY id_usuarioweb

EDIT: (by @GordonLinoff)

The reason that this is correct is easily seen if you look at the math. Consider three dates, a, b, and c.

The average time between them is:

((b - a) + (c - b)) / 2

This simplifies to:

(c - a) / 2

In other words, the intermediate value cancels out. And, this continues regardless of the number of intermediate values.

Upvotes: 10

Vamsi Prabhala
Vamsi Prabhala

Reputation: 49260

You can first number rows using row_number and then do a self-join with the cte. Then perform the average. However, you would get one row per user, but not as per the expected result.

with x as 
(select id_usuarioweb, dt_fechaventa,
row_number() over(partition by id_usuarioweb order by dt_fechaventa) as rn
from tablename)
select x1.id_usuarioweb, avg(datediff(dd,x1.dt_fechaventa,x2.dt_fechaventa)) as avgdiff
from x x1 join x x2 
on x1.id_usuarioweb = x2.id_usuarioweb and x1.rn = x2.rn-1
group by x1.id_usuarioweb

Upvotes: 1

Lamak
Lamak

Reputation: 70638

This should do:

;WITH CTE AS
(
    SELECT  *, 
            RN = ROW_NUMBER() OVER(PARTITION BY id_usuarioweb ORDER BY dt_fechaventa),
            N  = COUNT(*) OVER(PARTITION BY id_usuarioweb)
    FROM dbo.YourTable
)
SELECT  A.id_usuarioweb,
        AVG(DATEDIFF(DAY,A.dt_fechaventa,B.dt_fechaventa)) avgtime_days
FROM CTE A
INNER JOIN CTE B
    ON A.id_usuarioweb = B.id_usuarioweb
    AND A.RN = B.RN - 1
WHERE A.N > 1
GROUP BY A.id_usuarioweb;

I'm filtering the users that only have one row there, because you can't calculate an average of days with them.

Here is a demo in sqlfiddle of this. And the results are:

╔═══════════════╦══════════════╗
║ id_usuarioweb ║ avgtime_days ║
╠═══════════════╬══════════════╣
║       1551415 ║           17 ║
║       6958538 ║           16 ║
║      39101175 ║           46 ║
╚═══════════════╩══════════════╝

Upvotes: 2

Related Questions