Reputation: 153
I have a simple table: user(id, date, task)
The task field contains either "download" or "upload"
I want to figure out the number of users who do each action per day.
Output: date, # of users who downloaded, # of users who uploaded
I first ran into the issue of using a subquery in the aggregate count function of the select, so I thought I should be using a self join here to break apart the data in the "task" column.
I thought I could create to tables for each case and then combine those and count, but I am having trouble finishing this out:
SELECT id, date, task as task_download FROM user WHERE task = 'download'
SELECT id, date, task as task_upload FROM user WHERE task = 'upload'
Upvotes: 0
Views: 1477
Reputation: 108390
I'd use conditional aggregation.
To get a count of the number of users that performed at least one upload on a given date (but only increment the count by one for that user for that date, even if that user performed more than one upload on the same date), we can use a COUNT(DISTINCT user)
expression.
To get a count of the total number of uploads, we can use a COUNT or SUM.
SELECT DATE(t.date) AS `date`
, COUNT(DISTINCT IF(t.task='upload' ,t.user,NULL)) AS cnt_users_who_uploaded
, COUNT(DISTINCT IF(t.task='download',t.user,NULL)) AS cnt_users_who_downloaded
, SUM(IF(t.task='upload' ,1,0)) AS cnt_uploads
, SUM(IF(t.task='download',1,0)) AS cnt_downloads
FROM user t
GROUP BY DATE(t.date)
ORDER BY DATE(t.date)
Note: this will not return counts of zero for dates where there are no rows for that date
does not appear in the table.
Upvotes: 0
Reputation: 6783
I would say, neither nor. Just a query like this will do the job:
select `date`,
count(distinct case when task = 'download' then id else null end) as downloads,
count(distinct case when task = 'upload' then id else null end) as uploads
from user
where task in ('download', 'upload')
group by `date`
assuming, date
is a column containing only the date part and not the complete timestamp and id
is the user id. You can use the distinct
keyword within aggregate functions, that's what I did here.
To have this query run appropriately fast, I recommend using an index on task,date
If, however, date
contains the complete timestamp (i.e. including the time-part) you would want to group differently:
select `date`,
count(distinct case when task = 'download' then id else null end) as downloads,
count(distinct case when task = 'upload' then id else null end) as uploads
from user
where task in ('download', 'upload')
group by date(`date`)
Upvotes: 2
Reputation: 1269693
If you want the distinct users, then that suggests count(distinct)
:
SELECT date,
COUNT(DISTINCT CASE WHEN task = 'upload' THEN userid END) as uploads,
COUNT(DISTINCT CASE WHEN task = 'download' THEN userid END) as downloads
FROM user
GROUP BY date
ORDER BY date;
If you want distinct actions then you can do this as:
SELECT date,
SUM( (task = 'upload')::int ) as uploads,
SUM( (task = 'download')::int) as downloads
FROM user
GROUP BY date
ORDER BY date;
This uses a convenient Postgres shorthand for counting the boolean expressions.
Upvotes: 1
Reputation: 42753
select `date`,
COUNT( distinct CASE WHEN task = 'download' then id end ) 'download',
COUNT( distinct CASE WHEN task = 'upload' then id end ) 'upload'
from user
group by `date`
Upvotes: 3
Reputation: 10807
First count distinct users by date and task, and then sum users depending on each task by date.
select date,
sum(case when task = 'upload' then num_users else 0 end) as "upload",
sum(case when task = 'download' then num_users else 0 end) as "download"
from (
select date, task, count(distinct id) num_users
from usert
group by date, task
) x
group by date
;
Check it here: http://rextester.com/ZACFB64945
Upvotes: 1
Reputation: 30819
You can do it with sub-queries, e.g.:
SELECT `date` AS `day`,
(SELECT COUNT(*) FROM activity WHERE date = day AND activity = 'upload') AS upload_count,
(SELECT COUNT(*) FROM activity WHERE date = day AND activity = 'download') AS download_count
FROM activity
GROUP BY date;
Here's the SQL Fiddle.
Upvotes: 1