Reputation: 789

Count based on separate group by columns and with a condition

I am trying to combine three separate queries into one, and still produce the same results but as a single table. ColumnA and ColumnB both are actually in the date format of 'yyyy-mm-dd', ideally the final result will simply be a column of dates and the separate counts from each query.

select columnA, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnA

select columnB, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnB

select columnB, count(distinct columnC)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
and columnX in ('itemA','ItemB')
group by columnB

Upvotes: 0

Answers (4)

Dick McManus

Reputation: 789

I was able to get it to work using the following method:

With pullA as
(
  select columnA, count(*) as A_count
  from data.table
  group by columnA
),
pullB as
(
  select columnB, count(*) as B_count
  from data.table
  group by columnB
),

pullC as
(
  select columnB , count(*) as C_count
  from data.table
  where columnX in ('itemA', 'itemB')
  group by columnB
)

select ColumnB, A_count, B_count, C_count
from pullB
left join pullA
on ColumnB = ColumnA
left join pullC
on ColumnB = ColumnC

Is this approach any more or less efficient than a union or subquery approach?

Upvotes: 0

Steve Lovell

Reputation: 2564

The following would seem to be what you want:

select columnA, count(*) as cnt from data.table where timestamp between '2017-01-01' and '2017-01-07' group by columnA
Union All
select columnB, count(*) as cnt from data.table where timestamp between '2017-01-01' and '2017-01-07' group by columnB
Union All
select columnB, count(distinct columnC) as cnt from data.table where timestamp between '2017-01-01' and '2017-01-07' and columnX in ('itemA','ItemB') group by columnB

Upvotes: 1

Gordon Linoff

Reputation: 1269803

The following query expresses what you want to do:

select d.dte, coalesce(a.cnt, 0) as acnt, coalesce(b.cnt, 0) as bcnt,
       b.c_cnt
from (select columnA as dte from data.table where timestamp between '2017-01-01' and '2017-01-07'

      union
      select columnB from data.table where timestamp between '2017-01-01' and '2017-01-07'
     ) d left join
     (select columnA, count(*) as cnt
      from data.table
      where timestamp between '2017-01-01' and '2017-01-07'
      group by columnA
     ) a
     on d.dte = a.columnA left join
     (select columnB, count(*) as cnt,
             count(distinct case when columnX in ('itemA','ItemB') then columnC end) as c_cnt
      from data.table
      where timestamp between '2017-01-01' and '2017-01-07'
      group by columnB
     ) b
     on d.dte = b.columnB;

I think this is Hive-compatible, but occasional Hive has surprising deviations from other dialects of SQL.

Upvotes: 1

zipa

Reputation: 27869

Go with UNION ALL:

select columnA, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnA
UNION ALL
select columnB, count(*)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
group by columnB
UNION ALL
select columnB, count(distinct columnC)
from data.table
where timestamp between '2017-01-01' and '2017-01-07'
and columnX in ('itemA','ItemB')
group by columnB

Upvotes: 1

Count based on separate group by columns and with a condition

Answers (4)

Related Questions