Reputation: 55
I have one table whereas there are 5 millions records available and each day that records add by 20000 to 50000. I want to create script where I can keep old records and add all new records each day in existing table. I am using SQL server 2008R2.
+------------+----------+-------------+-------------+
| Date | Activity | COnversions | Impressions |
+------------+----------+-------------+-------------+
| 03/03/2014 | Book | 52 | 1000 |
| 03/03/2014 | Mobile | 100 | 5000 |
| 03/04/2014 | Book | 60 | 4500 |
| 03/04/2014 | Mobile | 40 | 6000 |
| 03/05/2014 | yahoo | 58 | 2000 |
| 03/05/2014 | MSN | 46 | 1500 |
+------------+----------+-------------+-------------+
The above is exisiting table for example.
New data:
+------------+--------+-----+-------+
| 03/03/2014 | Book | 52 | 1000 |
| 03/03/2014 | Mobile | 100 | 5000 |
| 03/04/2014 | Book | 60 | 4500 |
| 03/04/2014 | Mobile | 40 | 6000 |
| 03/05/2014 | yahoo | 58 | 2000 |
| 03/05/2014 | MSN | 46 | 1500 |
| 03/06/2014 | Book | 93 | 7000 |
| 03/06/2014 | Google | 97 | 15000 |
+------------+--------+-----+-------+
Now If you notice that date 03/06/2014 is new records in the raw table and I want to insert only this new records to the above fact table whereas I am inserting all data from different tables with the help of STORE PROCEDURE.
Questions: How can I make this process (SQL script) automated through SQL server Management Studio? Whereas I can input all new records automatically and prevent existing data to get double.
I am planning to write this script and want to make this process automate via SSIS and SQL server Agent, but don't have much clue about SQL store procedure. Any help appreciated!
Upvotes: 1
Views: 1319
Reputation: 40359
Something based on the following should work:
INSERT MyTable (Date, Activity, Conversions, Impressions)
select Date, Activity, Conversions, Impressions
from RawTable
except select Date, Activity, Conversions, Impressions
from MyTable
This will:
The downside is ugly table scans on multi-million row tables, but without indexes or careful planning there's not much you can do about that.
Turning this into a basic stored procedure is fairly simple:
CREATE PROCEDURE LoadNewData
AS
INSERT MyTable (Date, Activity, Conversions, Impressions)
select Date, Activity, Conversions, Impressions
from RawTable
except select Date, Activity, Conversions, Impressions
from MyTable
GO
Lots of bells and whistles are possible here (Nocount, error handling, return codes, activity logging, etc. etc.), much depends on what you want or need to do above and beyond loading data.
As for automating it through SSMS, you want to look into SQL Agent jobs. This is a long and somewhat complex subject, and a quick google search should turn up dozens of competent articles on the subject.
Upvotes: 2
Reputation: 52675
So it appears that Date and Activity is a candidate composite key. Using that you could construct the following
MERGE CurrentData as target
USING NewData as source
ON (target.date = source.date
and target.activity = source.activity)
WHEN NOT MATCHED THEN
INSERT (Date, Activity, Conversions, Impressions)
VALUES (source.Date, source.Activity, source.Conversions, source.Impressions);
Given the number of records you have I'd make sure that you have indexes on {date, activity}
on both tables
Note: CurrentData and NewData need to be replaced with the actual table names
Upvotes: 3