Reputation: 23
I have a table where one or more entries with the same 'id' value can be inserted into our log / fact table (contains over 100+ million records)
At a set frequency a new record is inserted into this table with a new value for the columns 'created' and 'view_percent' (percentage of a video viewed).
With two different queries, I would like to return:
Desired Result 1:
+------------------+--------------+-------------+------------------+------------+
| archive_asset_id | asset_title | count_asset | avg_view | time_day |
+------------------+--------------+-------------+------------------+------------+
| 83386 | Oliver James | 4 | 36.75 | 2010-08-09 |
+------------------+--------------+-------------+------------------+------------+
With this query I am not able to filter out the necessary records when performing the aggregate function... instead the average of all rows taken with a resulting value of 31.307
SELECT archive_asset_id, asset_title, COUNT(DISTINCT id * 1000000 + archive_asset_id) AS count_asset, AVG(view_percent) AS avg_view, FROM_UNIXTIME(created, '%Y-%m-%d') AS time_day
FROM log_embed_video
WHERE archive_asset_id = 83386
AND created >= 1281312000
AND created < 1281484800
GROUP BY time_day
ORDER BY time_day;
Desired Result 2:
+------------+------------------+---------------+------------------+-------------------------------+
| time_day | archive_asset_id | asset_title | MAX(view_percent) | occurrences MAX(view_percent) |
+------------+------------------+---------------+------------------+-------------------------------+
| 2010-08-09 | 83386 | Oliver James | 13 | 1 |
| 2010-08-09 | 83386 | Oliver James | 17 | 2 |
| 2010-08-09 | 83386 | Oliver James | 100 | 1 |
+------------+------------------+---------------+-------------------+------------------------------+
This is the query that I have used for result 2, but not quite what I want... the group by log_embed_video.id yields 4 results... which is to be expected for the given query, but not the desired output.
SELECT id, FROM_UNIXTIME(created, '%Y-%m-%d') AS time_day, archive_asset_id, asset_title, COUNT(DISTINCT id * 1000000 + archive_asset_id) AS 'count_asset', MAX(view_percent) as 'max_view_percent'
FROM log_embed_video
WHERE archive_asset_id = 83386
AND created >= 1281312000
AND created < 1281484800
GROUP BY time_day, id
Conditioned Data:
The rows mark with KEEP is the data I want to work with when returning result 1 and result 2.
SELECT id, archive_asset_id, asset_title, view_percent, FROM_UNIXTIME(created, '%Y-%m-%d') AS time_day
FROM log_embed_video
WHERE archive_asset_id = 83386
AND created >= 1281312000
AND created < 1281484800
ORDER BY id, view_percent;
+----------+------------------+--------------+--------------+------------+
| id | archive_asset_id | asset_title | view_percent | time_day |
+----------+------------------+--------------+--------------+------------+
| 43326898 | 83386 | Oliver James | 0 | 2010-08-09 | - DISCARD RECORD / DUPLICATE
| 43326898 | 83386 | Oliver James | 13 | 2010-08-09 | + KEEP
| 43432090 | 83386 | Oliver James | 0 | 2010-08-09 | - DISCARD RECORD / DUPLICATE
| 43432090 | 83386 | Oliver James | 17 | 2010-08-09 | + KEEP
| 43432092 | 83386 | Oliver James | 0 | 2010-08-09 | - DISCARD RECORD / DUPLICATE
| 43432092 | 83386 | Oliver James | 17 | 2010-08-09 | + KEEP
| 43470093 | 83386 | Oliver James | 0 | 2010-08-09 | - DISCARD RECORD / DUPLICATE
| 43470093 | 83386 | Oliver James | 17 | 2010-08-09 | - DISCARD RECORD / DUPLICATE
| 43470093 | 83386 | Oliver James | 35 | 2010-08-09 | - DISCARD RECORD / DUPLICATE
| 43470093 | 83386 | Oliver James | 52 | 2010-08-09 | - DISCARD RECORD / DUPLICATE
| 43470093 | 83386 | Oliver James | 69 | 2010-08-09 | - DISCARD RECORD / DUPLICATE
| 43470093 | 83386 | Oliver James | 87 | 2010-08-09 | - DISCARD RECORD / DUPLICATE
| 43470093 | 83386 | Oliver James | 100 | 2010-08-09 | + KEEP
+----------+------------------+--------------+--------------+------------+
Table and Raw Data:
CREATE TABLE `log_embed_video` (
`id` int(11) NOT NULL,
`archive_asset_id` int(11) NOT NULL,
`asset_title` varchar(255) NOT NULL,
`view_percent` float NOT NULL,
`created` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `log_embed_video` VALUES
(43326898, 83386, 'Oliver James', 0, 1281327306),
(43326898, 83386, 'Oliver James', 13, 1281327327),
(43432090, 83386, 'Oliver James', 0, 1281371423),
(43432090, 83386, 'Oliver James', 17, 1281371445),
(43432092, 83386, 'Oliver James', 0, 1281371424),
(43432092, 83386, 'Oliver James', 17, 1281371446),
(43470093, 83386, 'Oliver James', 0, 1281380789),
(43470093, 83386, 'Oliver James', 17, 1281380810),
(43470093, 83386, 'Oliver James', 35, 1281380830),
(43470093, 83386, 'Oliver James', 52, 1281380850),
(43470093, 83386, 'Oliver James', 69, 1281380871),
(43470093, 83386, 'Oliver James', 87, 1281380891),
(43470093, 83386, 'Oliver James', 100, 1281380906);
Upvotes: 2
Views: 1513
Reputation: 70540
All max-percentage per-id rows for a unique id:
SELECT a.*
FROM log_embed_video a
LEFT JOIN log_embed_video b
ON b.id = a.id
AND b.view_percent > a.view_percent
WHERE b.id IS NULL
-- possibly limit on date for more performance.
Performance wise this is better:
SELECT * FROM (
SELECT id, archive_asset_id, asset_title, view_percent, created,
@rn := IF(id != @old_id,1,@rn + 1) as rownumber,
@old_id := id
FROM log_embed_video
JOIN (SELECT @rn:=0,@old_id:=0) void
ORDER BY id, view_percent DESC
) a WHERE rownumber=1;
Upvotes: 1
Reputation: 12704
Check if this will make it more clear for you
SELECT archive_asset_id, AVG(actual_percent)
FROM (SELECT id, archive_asset_id, asset_title,
MAX(view_percent) as actual_percent
FROM log_embed_video GROUP by id) T
GROUP BY archive_asset_id;
It returns:
+------------------+---------------------+
| archive_asset_id | AVG(actual_percent) |
+------------------+---------------------+
| 83386 | 36.75 |
+------------------+---------------------+
A few notes
COUNT(DISTINCT id * 1000000 + archive_asset_id)
caught my eye as something bizarre; are you sure you don't mean simply COUNT(*)
or COUNT(id)
?EDIT:
For the second one
SELECT archive_asset_id, actual_percent, count(*)
FROM (SELECT id, archive_asset_id, asset_title,
MAX(view_percent) as actual_percent
FROM log_embed_video GROUP by id) T
GROUP BY archive_asset_id, actual_percent;
+------------------+----------------+----------+
| archive_asset_id | actual_percent | count(*) |
+------------------+----------------+----------+
| 83386 | 13 | 1 |
| 83386 | 17 | 2 |
| 83386 | 100 | 1 |
+------------------+----------------+----------+
Upvotes: 0