PostgreSQL: Count number of rows in table 1 for distinct rows in table 2

Question

I am working with really big data that at the moment I become confused, looking like I'm just repeating one thing.

I want to count the number of trips per user from two tables, trips and session.

psql=> SELECT * FROM trips limit 10;
trip_id |   session_ids   | daily_user_id | seconds_start | seconds_end 
---------+-----------------+---------------+---------------+-------------
  400543 | {172079}        |         17118 |    1575550944 |  1575551181
  400542 | {172078}        |         17118 |    1575541533 |  1575542171
  400540 | {172077}        |         17118 |    1575539001 |  1575539340
  400538 | {172076}        |         17117 |    1575540499 |  1575541999
  400534 | {172074,172075} |         17117 |    1575537161 |  1575539711
  400530 | {172073}        |         17116 |    1575447043 |  1575447682
  400529 | {172071}        |         17115 |    1575496394 |  1575497803
  400527 | {172070}        |         17113 |    1575495241 |  1575496034
  400525 | {172068}        |         17115 |    1575485658 |  1575489378
  400524 | {172067}        |         17113 |    1575488721 |  1575490491
(10 rows)

psql=> SELECT * FROM session limit 10;
session_id | user_id |           key            | start_time | daily_user_id 
------------+---------+--------------------------+------------+---------------
     172079 |      43 | hLB8S7aSfp4gAFp7TykwYQ==+| 1575550921 |         17118
            |         |                          |            | 
     172078 |      43 | YATMrL/AQ7Nu5q2dQTMT1A==+| 1575541530 |         17118
            |         |                          |            | 
     172077 |      43 | fOLX4tqvsyFOP3DCyBZf1A==+| 1575538997 |         17118
            |         |                          |            | 
     172076 |       7 | 88hwGj4Mqa58juy0PG/R4A==+| 1575540515 |         17117
            |         |                          |            | 
     172075 |       7 | 1O+8X49+YbtmoEa9BlY5OQ==+| 1575538384 |         17117
            |         |                          |            | 
     172074 |       7 | XOR7hsFCNk+soM75ZhDJyA==+| 1575537405 |         17117
            |         |                          |            | 
     172073 |      42 | rAQWwYgqg3UMTpsBYSpIpA==+| 1575447109 |         17116
            |         |                          |            | 
     172072 |     276 | 0xOsxRRN3Sq20VsXWjlrzQ==+| 1575511120 |         17114
            |         |                          |            | 
     172071 |       7 | P4beN3W/ZrD+TCpZGYh23g==+| 1575496642 |         17115
            |         |                          |            | 
     172070 |      43 | OFi30Zv9e5gmLZS5Vb+I7Q==+| 1575495238 |         17113
            |         |                          |            | 
(10 rows)

Goal: get the distribution of trips per user

Attempt:

psql=> SELECT COUNT(distinct trip_id) as trips
     , count(distinct user_id) as users
    , extract(year from to_timestamp(seconds_start)) as year_date
    , extract(month from to_timestamp(seconds_start)) as month_date
FROM trips
INNER JOIN session
    ON session_id = ANY(session_ids)
GROUP BY year_date, month_date  
ORDER BY year_date, month_date;
+-------+-------+-----------+------------+
| trips | users | year_date | month_date |
+-------+-------+-----------+------------+
| 371   |    44 |      2016 |          3 |
| 12207 |   185 |      2016 |          4 |
| 3859  |    88 |      2016 |          5 |
| 1547  |    28 |      2016 |          6 |
| 831   |    17 |      2016 |          7 |
| 427   |     4 |      2016 |          8 |
| 512   |    13 |      2016 |          9 |
| 431   |    11 |      2016 |         10 |
| 1011  |    26 |      2016 |         11 |
| 791   |    15 |      2016 |         12 |
| 217   |     8 |      2017 |          1 |
| 490   |    17 |      2017 |          2 |
| 851   |    18 |      2017 |          3 |
| 1890  |    66 |      2017 |          4 |
| 2143  |    43 |      2017 |          5 |
| .     |       |           |            |
| .     |       |           |            |
| .     |       |           |            |
+-------+-------+-----------+------------+

This resultset count number of users and trips, my intention is actually to get an analysis of trips per user, like so:

+------+-------------+
| user | no_of_trips |
+------+-------------+
| 1    |         489 |
| 2    |         400 |
| 3    |          12 |
| 4    |         102 |
| .    |             |
| .    |             |
| .    |             |
+------+-------------+

How do I do this, please?

Gordon Linoff · Accepted Answer

You seem to just want aggregation by user_id:

SELECT s.user_id, COUNT(distinct t.trip_id) as trips
FROM trips t INNER JOIN
     session s
     ON s.session_id = ANY(t.session_ids)
GROUP BY s.user_id ;

I'm pretty sure that the COUNT(DISTINCT) is unnecessary, so I would advise removing it:

SELECT s.user_id, COUNT(*) as trips
FROM trips t INNER JOIN
     session s
     ON s.session_id = ANY(t.session_ids)
GROUP BY s.user_id ;

PostgreSQL: Count number of rows in table 1 for distinct rows in table 2

Answers (1)

Related Questions