lucky_start_izumi
lucky_start_izumi

Reputation: 2591

Data warehouse fetch data directly from db or through api

We need to pull data into our data warehouse. One of the data source is from internal. We have two options: 1. Ask the data source team to expose the data through API. 2. Ask the data source team dump the data at a daily base, grant us a read only db credential to access the dump. Could anyone please give some suggestions?

Thanks a lot!

Upvotes: 0

Views: 1552

Answers (2)

Maksym Strukov
Maksym Strukov

Reputation: 2689

API solution Cons:

  • Cost. Your data source team will have to build the api. Then you will have to build the client application to read the data from the api and insert it into the db. You will also have to host the api somewhere as well as design the deployment process. It's quite a lot of work and I think it isn't worth it.
  • Pefromance. Not necessary but usually when it comes to datawarehouses it means one has to deal with a lot of data. With an api you will most likely have to transform your data first before you can use bulk insert features of your database

Daily db dumps solution looks way better for me but I would change it slightly if I were you. I would use a flat file. Most databases have a feature to bulk insert data from a file and it usually turns out to be the fastest wat to accomplish the task.

So from what I know from your question I think you should the following:

  1. Aggree with you data source team on the data file format. This way you can work independently and even use different RDBMS.
  2. Choose a good share to which both the data source team db and your db have fast access to.
  3. Ask the data source team to implement the export logic to a file.
  4. Implement the import logic from a file.

Please note items 3 and 4 should take only a few lines of code. As I said most databases have built in and OPTIMIZED functionality to export/import data to a file.

Hope it helps!

Upvotes: 1

Rob Conklin
Rob Conklin

Reputation: 9446

A lot of this really depends on the size and nature of the data, what kind of tools you are using, whether the data source team knows of "API", etc.

I think we need a lot more information to make an informed recommendation here. I'd really recommend you have a conversation with your DBAs, see what options they have available to you, and seriously consider their recommendations. They probably have far more insight than we will concerning what would be most effective for your problem.

Upvotes: 2

Related Questions