Reputation: 263
I host a popular website and want to store certain user events to analyze later. Things like: clicked on item, added to cart, removed from cart, etc. I imagine about 5,000,000+ new events would be coming in every day.
My basic idea is to take the event, and store it in a row in Postgres along with a unique user id.
What are some strategies to handle this much data? I can't imagine one giant table is realistic. I've had a couple people recommend things like: dumping the tables into Amazon Redshift at the end of every day, Snowflake, Google BigQuery, Hadoop.
What would you do?
Upvotes: 1
Views: 1705
Reputation: 246268
I would partition the table, and as soon as you don't need the detailed data in the live system, detach a partition and export it to an archive and/or aggregate it and put the results into a data warehouse for analyses.
Upvotes: 1
Reputation: 2302
We have similar use case with PostgreSQL 10 and 11. We collect different metrics from customers' websites.
We have several partitioned tables for different data and together we collect per day more then 300 millions rows, i.e. 50-80 GB data daily. In some special days even 2x-3x more.
Collecting database keeps data for current and last day (because especially around midnight there can be big mess with timestamps from different part of the world).
On previous versions PG 9.x we transferred data 1x per day to our main PostgreSQL Warehouse DB (currently 20+ TB). Now we implemented logical replication from collecting database into Warehouse because sync of whole partitions was lately really heavy and long.
Beside of it we daily copy new data to Bigquery for really heavy analytical processing which would on PostgreSQL take like 24+ hours (real life results - trust me). On BQ we get results in minutes but pay sometimes a lot for it...
So daily partitions are reasonable segmentation. Especially with logical replication you do not need to worry. From our experiences I would recommend to not do any exports to BQ etc. from collecting database. Only from Warehouse.
Upvotes: 0