Veeru Nidadavolu
Veeru Nidadavolu

Reputation: 1

HAWQ data to replicate between clusters

I have a requirement, I need to refresh the production HAWQ database to QA environment on daily basis.

How to move the every day delta into QA cluster from Production.

Appreciate your help

Thanks Veeru

Upvotes: 0

Views: 67

Answers (2)

Kyle Dunn
Kyle Dunn

Reputation: 370

Shameless self-plug - have a look at the following open PR for using Apache Falcon to orchestrate a DR batch job and see if it fits your needs.

https://github.com/apache/incubator-hawq/pull/940

Here is the synopsis of the process:

  1. Run hawqsync-extract to capture known-good HDFS file sizes (protects against HDFS / catalog inconsistency if failure during sync)
  2. Run ETL batch (if any)
  3. Run hawqsync-falcon, which performs the following steps:
    1. Stop both HAWQ masters (source and target)
    2. Archive source MASTER_DATA_DIRECTORY (MDD) tarball to HDFS
    3. Restart source HAWQ master
    4. Enable HDFS safe mode and force source checkpoint
    5. Disable source and remote HDFS safe mode
    6. Execute Apache Falcon-based distcp sync process
    7. Enable HDFS safe mode and force remote checkpoint

There is also a JIRA with the design description:

https://issues.apache.org/jira/browse/HAWQ-1078

Upvotes: 1

Jon Roberts
Jon Roberts

Reputation: 2106

There isn't a built-in tool to do this so you'll have to write some code. It shouldn't be too difficult to write either because HAWQ doesn't support UPDATE or DELETE. You'll only have to append new data to QA.

  • Create writable external tables in Production for each table that puts data in HDFS. You'll use the PXF format to write the data.
  • Create readable external tables in QA for each table that reads this data.
  • Day 1, you write everything to HDFS and then read everything from HDFS.
  • Day 2+, you find the max(id) from QA. Remove files from HDFS for the table. Insert into writable external table but filter the query so you get only records larger than the max(id) from QA. Lastly, execute an insert in QA by selecting all data from the external table.

Upvotes: 0

Related Questions