Reputation: 39263
I am working with visitor log data and need to summarize it by IP address. The data looks like this:
id | ip_address | type | message | ... ----------+----------------+----------+---------------- 1 | 1.2.3.4 | purchase | ... 2 | 1.2.3.4 | visit | ... 3 | 3.3.3.3 | visit | ... 4 | 3.3.3.3 | purchase | ... 5 | 4.4.4.4 | visit | ... 6 | 4.4.4.4 | visit | ...
And should summarize with:
type="purchase" DESC, type="visit" DESC, id DESC
The yield:
chosenid | ip_address | type | message | ... ----------+----------------+----------+---------------- 1 | 1.2.3.4 | purchase | ... 4 | 3.3.3.3 | purchase | ... 6 | 4.4.4.4 | visit | ...
Is there an elegant way to get this data?
An ugly approach follows:
set @row_num = 0; CREATE TEMPORARY TABLE IF NOT EXISTS tt AS SELECT *,@row_num:=@row_num+1 as row_index FROM log ORDER BY type="purchase" DESC, type="visit" DESC, id DESC ORDER BY rating desc;
Then get the minimum row_index and id for each ip_address (https://stackoverflow.com/questions/121387/fetch-the-row-which-has-the-max-value-for-a-column)
Then join those id's back to the original table
Upvotes: 1
Views: 665
Reputation: 1269823
The following query gets the most recent id
based on your rules by using a correlated subquery:
select t.ip_adddress,
(select t2.id
from table t2
where t2.ip_address = t1.ip_address
order by type = 'purchase' desc, id desc
limit 1
) as mostrecent
from (select distinct t.ip_address
from table t
) t;
The idea is to sort the data first by purchases (with id descending too) and then by visits and choose the first one in the list. If you have a table of ipaddresses, then you don't need the distinct
subquery. Just use that table instead.
To get the final results, we can join
to this or use in
or exists
. This uses in
.
select t.*
from table t join
(select id, (select t2.id
from table t2
where t2.ip_address = t1.ip_address
order by type = 'purchase' desc, id desc
limit 1
) as mostrecent
from (select distinct t.ip_address
from table t
) t
) ids
on t.id = ids.mostrecent;
This query will work best if there is an index on table(ip_address, type, id)
.
Upvotes: 0
Reputation: 35533
You can use Bill Karwin's approach here:
SELECT t1.*
FROM (SELECT *, CASE WHEN type = 'purchase' THEN 1 ELSE 0 END is_purchase FROM myTable) t1
LEFT JOIN (SELECT *, CASE WHEN type = 'purchase' THEN 1 ELSE 0 END is_purchase FROM myTable) t2
ON t1.ip_address = t2.ip_address
AND (t2.is_purchase > t1.is_purchase
OR (t2.is_purchase = t1.is_purchase AND t2.id > t1.id))
WHERE t2.id IS NULL
SQL Fiddle here
Upvotes: 0
Reputation: 49049
I think this should be what you need:
SELECT yourtable.*
FROM
yourtable INNER JOIN (
SELECT ip_address,
MAX(CASE WHEN type='purchase' THEN id END) max_purchase,
MAX(CASE WHEN type='visit' THEN id END) max_visit
FROM yourtable
GROUP BY ip_address) m
ON yourtable.id = COALESCE(max_purchase, max_visit)
Please see fiddle here.
My subquery will return the maximum purchase id (or null if there's no purchase) and the maximum visit id. Then I'm joining the table with COALESCE, if max_purchase is not null the join will be on max_purchase, otherwise it will be on max_visit.
Upvotes: 1