Reputation: 3973
I am working with really big data that at the moment I become confused, looking like I'm just repeating one thing.
I want to count the number of trips per user from two tables, trips
and session
.
psql=> SELECT * FROM trips limit 10;
trip_id | session_ids | daily_user_id | seconds_start | seconds_end
---------+-----------------+---------------+---------------+-------------
400543 | {172079} | 17118 | 1575550944 | 1575551181
400542 | {172078} | 17118 | 1575541533 | 1575542171
400540 | {172077} | 17118 | 1575539001 | 1575539340
400538 | {172076} | 17117 | 1575540499 | 1575541999
400534 | {172074,172075} | 17117 | 1575537161 | 1575539711
400530 | {172073} | 17116 | 1575447043 | 1575447682
400529 | {172071} | 17115 | 1575496394 | 1575497803
400527 | {172070} | 17113 | 1575495241 | 1575496034
400525 | {172068} | 17115 | 1575485658 | 1575489378
400524 | {172067} | 17113 | 1575488721 | 1575490491
(10 rows)
psql=> SELECT * FROM session limit 10;
session_id | user_id | key | start_time | daily_user_id
------------+---------+--------------------------+------------+---------------
172079 | 43 | hLB8S7aSfp4gAFp7TykwYQ==+| 1575550921 | 17118
| | | |
172078 | 43 | YATMrL/AQ7Nu5q2dQTMT1A==+| 1575541530 | 17118
| | | |
172077 | 43 | fOLX4tqvsyFOP3DCyBZf1A==+| 1575538997 | 17118
| | | |
172076 | 7 | 88hwGj4Mqa58juy0PG/R4A==+| 1575540515 | 17117
| | | |
172075 | 7 | 1O+8X49+YbtmoEa9BlY5OQ==+| 1575538384 | 17117
| | | |
172074 | 7 | XOR7hsFCNk+soM75ZhDJyA==+| 1575537405 | 17117
| | | |
172073 | 42 | rAQWwYgqg3UMTpsBYSpIpA==+| 1575447109 | 17116
| | | |
172072 | 276 | 0xOsxRRN3Sq20VsXWjlrzQ==+| 1575511120 | 17114
| | | |
172071 | 7 | P4beN3W/ZrD+TCpZGYh23g==+| 1575496642 | 17115
| | | |
172070 | 43 | OFi30Zv9e5gmLZS5Vb+I7Q==+| 1575495238 | 17113
| | | |
(10 rows)
Goal: get the distribution of trips per user
Attempt:
psql=> SELECT COUNT(distinct trip_id) as trips
, count(distinct user_id) as users
, extract(year from to_timestamp(seconds_start)) as year_date
, extract(month from to_timestamp(seconds_start)) as month_date
FROM trips
INNER JOIN session
ON session_id = ANY(session_ids)
GROUP BY year_date, month_date
ORDER BY year_date, month_date;
+-------+-------+-----------+------------+
| trips | users | year_date | month_date |
+-------+-------+-----------+------------+
| 371 | 44 | 2016 | 3 |
| 12207 | 185 | 2016 | 4 |
| 3859 | 88 | 2016 | 5 |
| 1547 | 28 | 2016 | 6 |
| 831 | 17 | 2016 | 7 |
| 427 | 4 | 2016 | 8 |
| 512 | 13 | 2016 | 9 |
| 431 | 11 | 2016 | 10 |
| 1011 | 26 | 2016 | 11 |
| 791 | 15 | 2016 | 12 |
| 217 | 8 | 2017 | 1 |
| 490 | 17 | 2017 | 2 |
| 851 | 18 | 2017 | 3 |
| 1890 | 66 | 2017 | 4 |
| 2143 | 43 | 2017 | 5 |
| . | | | |
| . | | | |
| . | | | |
+-------+-------+-----------+------------+
This resultset count number of users and trips, my intention is actually to get an analysis of trips per user, like so:
+------+-------------+
| user | no_of_trips |
+------+-------------+
| 1 | 489 |
| 2 | 400 |
| 3 | 12 |
| 4 | 102 |
| . | |
| . | |
| . | |
+------+-------------+
How do I do this, please?
Upvotes: 1
Views: 35
Reputation: 1269563
You seem to just want aggregation by user_id
:
SELECT s.user_id, COUNT(distinct t.trip_id) as trips
FROM trips t INNER JOIN
session s
ON s.session_id = ANY(t.session_ids)
GROUP BY s.user_id ;
I'm pretty sure that the COUNT(DISTINCT)
is unnecessary, so I would advise removing it:
SELECT s.user_id, COUNT(*) as trips
FROM trips t INNER JOIN
session s
ON s.session_id = ANY(t.session_ids)
GROUP BY s.user_id ;
Upvotes: 1