Reputation: 587
I'm looking to build a RESTful API in Go that would be in charge of inserting datas based on the data sent by multiple mobile apps (that would be stored in an Amazon Redshift cluster). Possibly receiving tens of thousands of requests per second.
From what I have read, Redshift give slow insert speeds. That's why few people have advised me to use an intermediate database like dynamodb or s3 in which I'd perform the inserts first. Then, in a second time, I'd import the data to Redshift.
I'm wondering why would I need to use Redshift in that case as the data would already be stored in a database ? Do you think I can proceed differently ?
I have also thought of a simpler solution by writing to a queue and progressively inserting the data to redshift but I think it might be a problem if the queue gets increasingly bigger as the insert speed isn't fast enough to compensate the incoming data.
Thanks in advance for your help! :-)
Upvotes: 0
Views: 1524
Reputation: 270039
Advice like this is normally off-topic for StackOverflow, but...
Amazon Redshift is a massively parallel processing (MPP) database with an SQL interface. It can be used to query TBs and even PBs of data and it can do it very efficiently.
You ask "why would I need to use Redshift" -- the answer is if your querying requirements cannot be satisfied with a traditional database. If you can satisfactorily use a normal database for your queries, then there's no real reason to use Redshift.
However, if your queries need Redshift, then you should continue to use it. The design of Redshift is such that the most efficient way to insert data is to load from Amazon S3 via the COPY
command. It is inefficient to insert data via normal INSERT
statements unless they are inserting many rows per INSERT
statement (eg hundreds or thousands).
So, some questions to ask:
You could also consider using Amazon Kinesis Firehose, which can accept a stream of data and insert it into an Amazon Redshift database automatically.
Upvotes: 2