amit2091
amit2091

Reputation: 356

How to copy materialized View JSON data from Initial table to another table Columns in AWS Redshift?

I have a Kinesis stream that dumps raw data into redshift as it is in JSON format. Below is the initial data structure and sample.

ApproximateArrivalTimestamp   |     data
--------------------------------------------------------------------------------------
2022-08-04 10:18:21.075       |     {"A" : "value of A" , "B" : "Value of B" , "C" : { "D" : "Value of D"}}

My requirement is to process incoming data periodically and put this JSON data into another table separated by columns utilizing schedulers.

Target Table

ApproximateArrivalTimestamp  |      A          |    B           |       D
--------------------------------------------------------------------------------
2022-08-04 10:18:21.075      |   value of A    |   value of B   |    value of D  

There are 2 questions here,

  1. Is there any way to select only the latest rows from the materialized view, so only the rows which were not processed earlier? I can use ApproximateArrivalTimestamp, but is there a better way?
  2. Does Redshift have any function that can use jsonpaths just like copying data from s3 to process the records in bulk or do I have to create a stored procedure using json_extract_path_text function to extract data from each row in a loop or something?

Any help or sample query based on the mentioned table will be appreciated.

Thanks for your help in advance !

Upvotes: 2

Views: 371

Answers (1)

Bill Weiner
Bill Weiner

Reputation: 11092

As for #1 - a timestamp is a common way to do this. An exact answer depends on your data solution. Does reprocessing a row cause an error or just an inefficiency? Etc

For #2 - I assume that the json keys match the columns already established in the target table. Looping is not a good way to think about SQL processing, rather joining your json with the list of target table columns will extract the data you are desiring.

Upvotes: 0

Related Questions