Reputation: 789
I am trying to combine three separate queries into one, and still produce the same results but as a single table. ColumnA and ColumnB both are actually in the date format of 'yyyy-mm-dd', ideally the final result will simply be a column of dates and the separate counts from each query.
select columnA, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnA
select columnB, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnB
select columnB, count(distinct columnC)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
and columnX in ('itemA','ItemB')
group by columnB
Upvotes: 0
Views: 43
Reputation: 789
I was able to get it to work using the following method:
With pullA as
(
select columnA, count(*) as A_count
from data.table
group by columnA
),
pullB as
(
select columnB, count(*) as B_count
from data.table
group by columnB
),
pullC as
(
select columnB , count(*) as C_count
from data.table
where columnX in ('itemA', 'itemB')
group by columnB
)
select ColumnB, A_count, B_count, C_count
from pullB
left join pullA
on ColumnB = ColumnA
left join pullC
on ColumnB = ColumnC
Is this approach any more or less efficient than a union or subquery approach?
Upvotes: 0
Reputation: 2564
The following would seem to be what you want:
select columnA, count(*) as cnt from data.table where timestamp between '2017-01-01' and '2017-01-07' group by columnA
Union All
select columnB, count(*) as cnt from data.table where timestamp between '2017-01-01' and '2017-01-07' group by columnB
Union All
select columnB, count(distinct columnC) as cnt from data.table where timestamp between '2017-01-01' and '2017-01-07' and columnX in ('itemA','ItemB') group by columnB
Upvotes: 1
Reputation: 1269803
The following query expresses what you want to do:
select d.dte, coalesce(a.cnt, 0) as acnt, coalesce(b.cnt, 0) as bcnt,
b.c_cnt
from (select columnA as dte from data.table where timestamp between '2017-01-01' and '2017-01-07'
union
select columnB from data.table where timestamp between '2017-01-01' and '2017-01-07'
) d left join
(select columnA, count(*) as cnt
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnA
) a
on d.dte = a.columnA left join
(select columnB, count(*) as cnt,
count(distinct case when columnX in ('itemA','ItemB') then columnC end) as c_cnt
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnB
) b
on d.dte = b.columnB;
I think this is Hive-compatible, but occasional Hive has surprising deviations from other dialects of SQL.
Upvotes: 1
Reputation: 27869
Go with UNION ALL
:
select columnA, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnA
UNION ALL
select columnB, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnB
UNION ALL
select columnB, count(distinct columnC)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
and columnX in ('itemA','ItemB')
group by columnB
Upvotes: 1