jpcn
jpcn

Reputation: 21

Add rows in table only if those rows do not already exist in that same table

I have a bunch of .csv files containing client information about their cloud usage. Each .csv contains data about one client and each file tracks what the client has used for the day. I have one year worth of those files, for every client. For instance : I have client1__05022020, client1_06022020, client2_05022020, etc until client1_06022021 and that's so for all the other clients.

I have a temporary table named Cloud__TMP. In this table, I integrate 1 raw .csv file and then in a stored procedure, I have sql requests which fill other tables like my "client" table. Then I delete what's in Cloud__TMP and integrate another .csv and it goes on and on.

As I said, I have a "client" table with id (primary key), idCloud (nvarchar), clientName (nvarchar). I want to fill it with data from Cloud__TMP. Basically, there are 2 columns in Cloud__TMP that are of interest for my "client" table : Org (which corresponds to idCloud) and OrgFullName (which corresponds to clientName). edit : It's important to note that there is a multitude of rows in each file since it's about cloud usage (everytime the client uses more or less, a new row is put in the .csv file). So, my columns OrgFullName and idCloud are filled the same client and the same cloud id many, many times.

My issue is that I have trouble making it so there are no duplicates in my "client" table. I tried a bunch of requests but I'm not very good at it and could use your help.

Here's one request I've tried:

INSERT INTO client (clientName, idCloud)
        SELECT OrgFullName, Org
        FROM Cloud__TMP AS Cloud
        WHERE NOT EXISTS (SELECT * FROM client WHERE Cloud.Org = client.idCloud)

Upvotes: 1

Views: 86

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269513

Your insert is fine . . . unless the incoming data has duplicates. You may want to remove those as well:

INSERT INTO client (clientName, idCloud)
    SELECT MAX(OrgFullName), Org
    FROM Cloud__TMP AS Cloud
    WHERE NOT EXISTS (SELECT * FROM client WHERE Cloud.Org = client.idCloud)
    GROUP BY Org;

This arbitrarily chooses one value when the incoming data has duplicates.

Upvotes: 2

Related Questions