madman
madman

Reputation: 153

SQL: Union or Self Join

I have a simple table: user(id, date, task)

The task field contains either "download" or "upload"

I want to figure out the number of users who do each action per day.

Output: date, # of users who downloaded, # of users who uploaded

I first ran into the issue of using a subquery in the aggregate count function of the select, so I thought I should be using a self join here to break apart the data in the "task" column.

I thought I could create to tables for each case and then combine those and count, but I am having trouble finishing this out:

SELECT id, date, task as task_download FROM user WHERE task = 'download'

SELECT id, date, task as task_upload FROM user WHERE task = 'upload'

Upvotes: 0

Views: 1477

Answers (6)

spencer7593
spencer7593

Reputation: 108390

I'd use conditional aggregation.

To get a count of the number of users that performed at least one upload on a given date (but only increment the count by one for that user for that date, even if that user performed more than one upload on the same date), we can use a COUNT(DISTINCT user) expression.

To get a count of the total number of uploads, we can use a COUNT or SUM.

SELECT DATE(t.date) AS `date`
     , COUNT(DISTINCT IF(t.task='upload'  ,t.user,NULL)) AS cnt_users_who_uploaded
     , COUNT(DISTINCT IF(t.task='download',t.user,NULL)) AS cnt_users_who_downloaded
     , SUM(IF(t.task='upload'  ,1,0))                    AS cnt_uploads
     , SUM(IF(t.task='download',1,0))                    AS cnt_downloads
  FROM user t
 GROUP BY DATE(t.date)
 ORDER BY DATE(t.date)

Note: this will not return counts of zero for dates where there are no rows for that date does not appear in the table.

Upvotes: 0

Psi
Psi

Reputation: 6783

I would say, neither nor. Just a query like this will do the job:

select `date`, 
    count(distinct case when task = 'download' then id else null end) as downloads, 
    count(distinct case when task = 'upload' then id else null end) as uploads
from user
where  task in ('download', 'upload')
group by `date`

assuming, date is a column containing only the date part and not the complete timestamp and id is the user id. You can use the distinct keyword within aggregate functions, that's what I did here.

To have this query run appropriately fast, I recommend using an index on task,date

If, however, date contains the complete timestamp (i.e. including the time-part) you would want to group differently:

select `date`, 
    count(distinct case when task = 'download' then id else null end) as downloads, 
    count(distinct case when task = 'upload' then id else null end) as uploads
from user
where  task in ('download', 'upload')
group by date(`date`)

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269693

If you want the distinct users, then that suggests count(distinct):

SELECT date, 
       COUNT(DISTINCT CASE WHEN task = 'upload' THEN userid END) as uploads,
       COUNT(DISTINCT CASE WHEN task = 'download' THEN userid END) as downloads
FROM user
GROUP BY date
ORDER BY date;

If you want distinct actions then you can do this as:

SELECT date, 
       SUM( (task = 'upload')::int ) as uploads,
       SUM( (task = 'download')::int) as downloads
FROM user
GROUP BY date
ORDER BY date;

This uses a convenient Postgres shorthand for counting the boolean expressions.

Upvotes: 1

Oto Shavadze
Oto Shavadze

Reputation: 42753

select  `date`, 
COUNT( distinct CASE WHEN task = 'download' then id end ) 'download', 
COUNT( distinct CASE WHEN task = 'upload' then id end ) 'upload'
from user
group by  `date`

Upvotes: 3

McNets
McNets

Reputation: 10807

First count distinct users by date and task, and then sum users depending on each task by date.

select date,
       sum(case when task = 'upload' then num_users else 0 end) as "upload",
       sum(case when task = 'download' then num_users else 0 end) as "download"
from  (       
       select   date, task, count(distinct id) num_users
       from     usert
       group by date, task
      ) x
group by date
;

Check it here: http://rextester.com/ZACFB64945

Upvotes: 1

Darshan Mehta
Darshan Mehta

Reputation: 30819

You can do it with sub-queries, e.g.:

SELECT `date` AS `day`,
(SELECT COUNT(*) FROM activity WHERE date = day AND activity = 'upload') AS upload_count,
(SELECT COUNT(*) FROM activity WHERE date = day AND activity = 'download') AS download_count
FROM activity
GROUP BY date;

Here's the SQL Fiddle.

Upvotes: 1

Related Questions