Deepak Singhal
Deepak Singhal

Reputation: 10874

How to put log files in key-value format into redshift ( from S3 or directly app servers )

I have my logs in key-value format. These key-values pairs can change for different logs. Want to do analytics on it.. As it is un-structured thought I should put in dynamodb but then for analytics redshift is better. I also might not have to persist all key-values into redshift but this is optional. Few options I was thinking:

  1. Put logs into S3. Then use copy command; but I couldn't find a way how to convert key-value to JSON format in COPY command because copy takes only json or csv !
  2. Use Kinesis Stream to get log files on stream. But then what is the best way to consume these log files ? Through Lambda or Kinesis Client Library ! One option I was thinking was to kinesis agent formatter to format the files to JSON but thats not very flexible. And after that what !
  3. Put log files into cloudwatch logs ! But then how to consume it

Upvotes: 2

Views: 585

Answers (2)

Murtaza Kanchwala
Murtaza Kanchwala

Reputation: 2483

Amazon AWS published a very interesting Blog regarding to it. See if this fulfills your requirement.

ETL Processing of Web Server Logs using AWS EMR and DataPipeline

For Real-Time you can also look on reverse engineered Kinesis Firehose Stream. Which takes your data directly to S3 or Redshift. You can modify your Kinesis producer or publisher to Transform the logs either in json or csv to perform your Load operation.

Please comment below for more help.

Upvotes: 2

omuthu
omuthu

Reputation: 6333

If you have the data in S3, then try using the template for "Loading data from S3 to Redshift" in AWS Data Pipeline. This template takes care of loading data from S3 to redshift

Note : It may use EMR and so may launch EC2 for processing the data till it loads them to redshift.

Upvotes: 1

Related Questions