Roger Steinberg
Roger Steinberg

Reputation: 1604

Join information from one table to another without joining twice cross apply T-SQL

Situation:

I have two tables. #t1 has logins and emails. #t2 has country associated to each emails. I would like to join the information from #t2 to #t1 without having to join it twice. Joining it only once either in the inner or outer query would break the cross apply logic. My current query uses a cross apply to get rolling information as such (fiddle data below):

SELECT DISTINCT CAST(logins AS DATE) AS Dates,
    count(distinct d.email) AS DAU,
    count(distinct m.MAU) AS MAU
FROM #t1 d
CROSS APPLY (
            SELECT CAST(m.logins as date) as dates, m.email AS MAU
            FROM #t1 m
            WHERE m.logins BETWEEN d.logins and DATEADD(dd, 30, d.logins) 
            ) m
group by CAST(logins as date)

The only way i found to join the two tables without having to break my cross apply was to inner join it in both the outer and inner query which is probably wrong but at least the output is correct. I do that so i can add my second condition in my where statement in the inner query. when i apply the logic to my actual table, the performance is dreadful(fiddle data below):

SELECT distinct CASt(logins AS DATE) AS Dates,
    #t2.country,
    count(distinct d.email) AS DAU,
    count(distinct m.MAU) AS MAU
FROM #t1 d
inner join #t2 on d.email=#t2.email 
CROSS APPLY (
            SELECT cast(m.logins as date) as dates, m.email AS MAU, country.country AS country
            FROM #t1 m
            inner join #t2 country on m.email=country.email         
            WHERE m.logins BETWEEN d.logins and DATEADD(dd, 30, d.logins)  
            and #t2.country = country.country
            ) m
group by cast(logins as date), #t2.country


+-------------+---------+-----+-----+
|    Dates    | country | DAU | MAU |
+-------------+---------+-----+-----+
|  2019-04-01 | france  |   1 |   2 |
|  2019-04-02 | france  |   1 |   2 |
|  2019-04-03 | france  |   1 |   2 |
|  2019-04-10 | france  |   1 |   1 |
|  2019-04-03 | italie  |   2 |   2 |
+-------------+---------+-----+-----+

Objective:

How can i find a way to join information from one table to another without having to join it twice. (fiddle data below)

The result should look like this (output from the second query above):

Fiddle:

create table #t1 (email varchar(max), logins datetime)
insert into #t1 values 
('[email protected]', '2019-04-01 00:00:00.000'),
('[email protected]', '2019-04-02 00:00:00.000'), 
('[email protected]', '2019-04-03 00:00:00.000'), 
('[email protected]', '2019-04-10 00:00:00.000'), 

('[email protected]', '2019-04-03 00:00:00.000'), 
('[email protected]', '2019-04-03 00:00:00.000'), 
('[email protected]', '2019-04-03 00:00:00.000')  

create table #t2 (country varchar(max), email varchar(max))
insert into #t2 values 
('france', '[email protected]'),
('france', '[email protected]'),
('italie', '[email protected]'),
('italie', '[email protected]')

Upvotes: 2

Views: 60

Answers (1)

GreyOrGray
GreyOrGray

Reputation: 1729

Update

So I had initially said the second should peform better, but I'll eat those words. The first works better by far in my testing.

In my test environment I generated your tables as permanent tables and first populated #t2 (I called it emailLocation) with 100,000 unique email addresses spread across 206 countries. The second table (loginRecord) was populated with 2,000,000 random entries spread across 1/1/2018 - 12/31/2019. Both of these tables are indexed.

The below query is basically the one I said would be slower (it isn't). The major difference in this one is that I am filtering the dates within the CTE to reduce the data set. In my environment this runs in 20 seconds and returns 48,410 rows. I didn't test how long it would take to return the whole set, but trying this same CTE with a self-join ran for 10 minutes before I killed it.

WITH joined AS
(
    SELECT
        t1.logins AS dates,
        t1.email,
        t2.country
        FROM loginRecord t1
        JOIN dbo.emailLocation t2 ON t2.email = t1.email
        WHERE t1.logins > GETDATE()
)
SELECT 
    dates,
    country,
    COUNT(DISTINCT(email)) AS DAU,
    (SELECT COUNT(DISTINCT(email)) FROM joined WHERE country = j.country AND dates BETWEEN j.dates AND DATEADD(DAY,30,j.dates)) AS MAU
FROM joined j
GROUP BY j.dates, j.country
ORDER BY country, dates

---original answer

It feels like you're stuck on using the cross apply logic. Here are two options that don't use cross apply. Both use a CTE to get a nice clean grouping of your temp tables, then the first option is a correlated subquery (blech) and the second is a self join.

Rextester here: https://rextester.com/AVJS76389

WITH joined AS
(
    SELECT 
        t1.logins AS dates,
        t1.email,
        t2.country
        FROM #t1 t1
        JOIN #t2 t2 ON t2.email = t1.email
)
SELECT 
    dates,
    country,
    COUNT(DISTINCT(email)) AS DAU,
    (SELECT COUNT(DISTINCT(email)) FROM joined WHERE country = j.country AND dates BETWEEN j.dates AND DATEADD(DAY,30,j.dates)) AS MAU
FROM joined j
GROUP BY j.dates, j.country;



WITH joined AS
(
    SELECT 
        t1.logins AS dates,
        t1.email,
        t2.country
        FROM #t1 t1
        JOIN #t2 t2 ON t2.email = t1.email
)
SELECT 
    j1.dates,
    j1.country,
    COUNT(DISTINCT(j1.email)) AS DAU,
    COUNT(DISTINCT(j2.email)) AS MAU
FROM joined j1
JOIN joined j2
    ON j1.country = j2.country
    AND j2.dates BETWEEN j1.dates AND DATEADD(DAY,30,j1.dates)
GROUP BY j1.dates, j1.country 

Upvotes: 1

Related Questions