Reputation: 5529
I am starting on an analytics project that will be getting data from several different sources and comparing them to one another. Sources can be anything from an API such as google analytics API to a locally hosted database.
Should I build a single database to import this data into on a regular basis?
Can anyone suggest some best practices, patterns or articles? I really don't know where to start with this so any information would be great! Thanks!
I will be using SQL Server 2008 R2, C# 4.0.
Upvotes: 2
Views: 1526
Reputation: 36
You need to take up the following steps: 1. First you need to pick up the ETL platform like SSIS, Informatica, or other ETL tools, etc. 2. Then, you need to pick up the appropriate database like Oracle or SQL server, etc. 3. Thereafter, you need to make the logical data warehouse modeling (Star or Snowflake) and 4. Finally, you need to develop the whole data ware house.
I would advise making two databases, i.e. 1. ODS for storing the data from different sources and for cleansing and 2. Warehouse database for storing all the relevant data.
Upvotes: 0
Reputation: 1192
I would start with SSIS which is a data integration technology that comes with SQL Server. It may handle a lot of the data sources you need. If you are using APIs such as Googles to get data you may need to put that in a staging table first.
Start with a single staging database which you will use as your primary source to load data into Analysis Services and see how that works out. Use SSIS to populate that staging database.
Upvotes: 3
Reputation:
That's a big question, Mike - plenty of people have entire careers doing nothing but Data Warehousing.
I would give a qualified "yes" to your first question - one of the main attractions of a DWH is that you can consolidate multiple data sources into a single source of information. (The qualification is that there may be circumstances where you don't want to do this - for example, for security or performance reasons.)
As ever, Wikipedia is a reasonable first stop for information on this subject. Since your question is already tagged with data-warehouse, StackOverflow is another possible source.
The canonical books on the subject are probably:
Note that the Inmon and Kimball approaches are radically different - Inmon concentrates on a top-down, normalised relational approach to constructing an enterprise DWH, while Kimball's approach is more bottom-up, dimensional, functional datamart-based.
The DWH Toolkit concentrates on the technical aspects of building a DWH, while The DWH Lifecycle Toolkit is based as much on the organisational challenges as on the technical details.
Good luck!
Upvotes: 4