Reputation: 1363
From lots of sources i am planning to use Amazon kinesis to catch the stream and after certain level of data transformation i want to direct the stream to Redshift Cluster in some table schema. Here i am not sure as is it right way to do this or not ?
From the Kineis documentation i have found that they have direct connector to redshift. However i have also found that Redshift looks better if we take bulk upload as data ware house system needs indexing. So the recommendation was to store all stream to S3 and then COPY command to make bulk push on redshift . Could someone please add some more view ?
Upvotes: 2
Views: 2573
Reputation: 12891
When you use the connector library for Kinesis you will be pushing data into Redshift, both through S3 and in batch.
It is true that calling INSERT INTO Redshift is not efficient as you are sending all the data through a single leader node instead of using the parallel power for Redshift that you get when running COPY from S3.
Since Kinesis is designed to handle thousands of events per second, running a COPY every few seconds or minutes will already batch many thousands of records.
If you want to squeeze the juice from Kinesis and Redshift, you can calculate exactly how many shards you need, how many nodes in Redshift you need and how many temporary files in S3 you need to accumulate from Kinisis, before calling the COPY command to Redshift.
Upvotes: 2