Andrew C
Andrew C

Reputation: 117

Create rows of data into table that do not have a specific column value

I have a table that has missing information that needs to be inserted with very specific conditions. However, I am unsure on how to proceed.

I have one table that contains an overall view of information. Using Projects as an example

ProjectID ProjectName
1 Project A
2 Project B
3 Project C
4 Project D

I have another table that contains a subset of information of the first table which shows the Statuses and Timeline of those projects.

ProjectStatusID ProjectID ProjectStatus Date
1 1 Started 2/10/2020
2 1 In Testing 5/6/2021
3 1 Finished 7/1/2021
4 2 In Testing 1/30/2019
5 2 Finished 3/18/2020
6 4 Started 10/22/2016
7 4 Finished 3/18/2020

As you can see from the second table, there is no entry for when Project B 'started' and there is no entries for Project C at all.

I want to add entries for Projects missing a 'started' status with the arbitrary date: 1/1/2000. So the new second table would look like this:

ProjectStatusID ProjectID ProjectStatus Date
1 1 Started 2/10/2020
2 1 In Testing 5/6/2021
3 1 Finished 7/1/2021
4 2 In Testing 1/30/2019
5 2 Finished 3/18/2020
6 4 Started 10/22/2016
7 4 Finished 3/18/2020
8 2 Started 1/1/2000
9 3 Started 1/1/2000

The first way I thought of was to get all of the Projects that don't have the desired status, but the results of the script I created came out wrong.

SELECT t.ProjectID FROM t1 t
INNER JOIN t2 tt ON t.ProjectID = tt.ProjectID
WHERE pp.ProjectStatus != 'Started'

I then tried going the other direction and found all the Projects that do have the desired status and I succeeded in doing that.

SELECT ProjectStatusID, ProjectID, ProjectStatus FROM t2
WHERE ProjectStatus NOT IN (SELECT ProjectStaus FROM t2
WHERE ProjectStatus != 'Started')

However, I am unsure how to use this result column in another script. I could obviously do a long string using the Where clause and copying a pasting all of the ProjectID values, but that is obviously inefficient, especially if there were many more Projects than the ones I gave in the example above.

Upvotes: 4

Views: 119

Answers (2)

Jonas Metzler
Jonas Metzler

Reputation: 5975

This answer explains well how to insert only the missing rows, I would use exactly the same insert command for that.

Just to go a step further: In your use case, it seems very likely you don't ever want the same project to have multiple times the same status. Unless a project can be "restarted" and finished again. If so, forget this answer.

Otherwise, if a project should always be started one time, in testing one time and finished one time, that's a good use case to create a unique index like this:

CREATE UNIQUE INDEX UNQ_Project_Status ON ProjectStatus (ProjectId, ProjectStatus)
  WITH (IGNORE_DUP_KEY = ON); -- this WITH clause is optional

This will make sure the same project can't be started/in testing/finished multiple times.

Note the optional WITH (IGNORE_DUP_KEY = ON) part.

If you run insert commands and don't care what rows would violate above index, then that's a good use case for that WITH clause (although one can argue it's "cleaner" to exclude duplicates rather than let a constraint or index catch them).

In this case, you can just execute the insert command without the need to write a condition what project already has been started:

INSERT INTO ProjectStatus (ProjectId, ProjectStatus, [Date])
  SELECT ProjectId, 'Started', '1/1/2000' FROM Project p;

The missing rows will be inserted, the others not. You will be informed that duplicates have been ignored:

Duplicate key was ignored. 2 rows affected

If you don't want this "internal check", but to handle it yourself and to know what insert commands fail, you can skip the WITH clause in the index definition, but still create the index.

Then you will switch back to the insert command with NOT EXISTS to exclude duplicates. And you will still be sure you won't ever insert duplicates by mistake thanks to the index.

If you ever forget to check possible duplicates in any insert command, you will get an usual error message that the index has been violated.

See this sample fiddle

Read the documentation about this and other index options.

Upvotes: 0

Dale K
Dale K

Reputation: 27388

I think you just want NOT EXISTS to check which Projects are missing the 'Started' status.

insert into ProjectStatus (ProjectId, ProjectStatus, [Date])
select ProjectId, 'Started', '1/1/2000'
from Project p
where not exists (
  select 1
  from ProjectStatus ps
  where ps.ProjectId = p.ProjectId
  and ps.ProjectStatus = 'Started'
);

And if you wanted to only add the 'Started' status only to projects which had at least one status recorded, but don't have a 'Started' status (which isn't what you asked) then you could just check for the missing ProjectStatus using GROUP BY and HAVING e.g.

select ProjectId, 'Started', '1/1/2000'
from ProjectStatus
group by ProjectId
having sum(case when ProjectStatus = 'Started' then 1 else 0 end) = 0;

DBFiddle

Upvotes: 4

Related Questions