Reputation: 1802
I have 3 tables in my redshift database and data is coming from 3 different csv files from S3 every few seconds. One table has ~3 billion records and other 2 has ~100 million record. For the near realtime reporting purpose, I have to merge this table into 1 table. How do I achieve this in redshift ?
Upvotes: 0
Views: 321
Reputation: 1227
Near Real Time Data Loads in Amazon Redshift
I would say that the first step is to consider whether Redshift is the best platform for the workload you are considering. Redshift is not an optimal platform for streaming data.
Redshift's architecture is better suited for batch inserts than streaming inserts. "COMMIT"s are "costly" in Redshift.
You need to consider the performance impact of VACUUM and ANALYZE if those operations are going to compete for resources with streaming data.
It might still make sense to use Redshift on your project depending on the entire set of requirements and workload, but bear in mind that in order to use Redshift you are going to engineer around it, and probably change your workload from a "near-real-time" to a micro batch architecture.
In order to summarize it:
Upvotes: 1