Di Zou
Di Zou

Reputation: 4609

How do I do fuzzy group by in mysql?

I have this query in MySQL:

select 
    hour,
    traffic_source_name,
    COUNT(*)
from 
    my_data_table
group by
    hour,
    traffic_source_name

This gives me results like this:

'h', 'traffic_source_name', 'COUNT(*)
'2015-07-28 00:00', 'google', '628'
'2015-07-28 00:00', 'Google Tier 2 Search', '1'
'2015-07-28 00:00', 'Taboola/Outbrain Content', '3'
'2015-07-28 00:00', 'yahoo', '3'
'2015-07-28 00:00', 'Email', '67'
'2015-07-28 00:00', 'msn', '253'
'2015-07-28 00:00', 'Facebook Social Media', '139'
'2015-07-28 01:00', 'yahoo', '2'
'2015-07-28 01:00', 'Email', '46'
'2015-07-28 01:00', 'Popular Marketing', '1'
'2015-07-28 01:00', 'Yahoo Stream Ads Content', '3'
'2015-07-28 01:00', 'Facebook Social Media', '183'
'2015-07-28 01:00', 'google', '530'
'2015-07-28 01:00', 'msn', '210'

The thing is though, I would like to combine 'google' and 'Google Tier 2 Search' into one row as 'google' and 'yahoo' and 'Yahoo Stream Ads Content' into 'yahoo'. How would I go about doing this?

Upvotes: 1

Views: 332

Answers (1)

Uueerdo
Uueerdo

Reputation: 15961

Transform the values before the group, like so:

select 
    hour,
    IF (traffic_source_name IN ('google', 'Google Tier 2 Search'), 'google'
    , IF (traffic_source_name IN ('yahoo', 'Yahoo Stream Ads Content'), 'yahoo'
    , traffic_source_name
    )) AS tsn,
    COUNT(*)
from 
    my_data_table
group by
    hour,
    tsn

this might have better readability though:

select 
    hour,
    CASE traffic_source_name 
    WHEN 'Google Tier 2 Search' THEN 'google'
    WHEN 'Yahoo Stream Ads Content' THEN 'yahoo'
    ELSE traffic_source_name
    END AS tsn,
    COUNT(*)
from 
    my_data_table
group by
    hour,
    tsn

The first lets you make the grouped values more obvious, while the second lets you avoid having to maintain the nested IF(,,IF(,,IF...))) stuff.... or we could do this for the best of both:

select 
    hour,
    CASE 
    WHEN traffic_source_name IN ('google', 'Google Tier 2 Search') 
      THEN 'google'
    WHEN traffic_source_name IN ('yahoo', 'Yahoo Stream Ads Content')
      THEN 'yahoo'
    ELSE traffic_source_name
    END AS tsn,
    COUNT(*)
from 
    my_data_table
group by
    hour,
    tsn

Upvotes: 2

Related Questions