Reputation: 524
CL.'s answer does the trick! I ended up using a Python script (which can be viewed in the next section down called "Updates: Working Towards an Answer") and once I set up my database properly so that the ID columns were set as integer keys (or, if that wasn't possible, numeric) and the Name columns were set as text then it worked!
I tried running a .py file that looks like this:
import sqlite3
conn = sqlite3.connect('data.db')
c = conn.cursor()
c.executescript("""
UPDATE CorpData
SET OperationID4Counter =
(SELECT COUNT(*)
FROM PlantData JOIN OperationData
ON PlantName LIKE '%' || OperationName
WHERE OperationID IN (SELECT OperationID
FROM ServiceData
WHERE ServiceID = 512)
AND CorpID = CorpData.CorpID)
""")
and get this error: sqlite3.OperationalError: ambiguous column name: OperationID
. I'm guessing this is because we have joined PlantData and OperationData, both of which have a column named OperationID
. When I change that line of code to read WHERE OperationData.OperationID IN (SELECT OperationID
or WHERE PlantData.OperationID IN (SELECT OperationID
, it runs but I end up with a zero in all rows of my CorpData
table under the OperationID4Counter
column.
I think we're close, but no cigar. I think something is not right with the ON StationName LIKE '%' || OperationName
line because when I change it to ON StationName LIKE '%house'
(which should, if I understand this properly, get all that end in 'house' which would include Warehouse) I still end up with all zeroes for the OperationID4Counter (even though it should at least be counting Warehouses which do have an OperationID4.)
CL. asked for some .dump information to see what types are being used in this database. I have not specified anything so it has just been using defaults. Also note that the various tables have more columns than the ones I showed in my examples (but also note that these columns are not relevant for this question as they deal with data not related to the question at hand.) For example, one piece of the .dump for the PlantData table looks like this:
INSERT INTO "PlantData" VALUES('60015145','0','0','50000000','10000','15','386
8','1000181','30003830','20000560','10000048','Anytown 334 - Unit 3 - Widgit Corp Logistics Center','-1.444E+12','-71312793600','-9.25528E+11','0.5','0.025','4
');
A .dump piece from OperationData looks like this:
INSERT INTO "OperationData" VALUES('20','45','Manufacturing','','0','0','0','0',
'0','','','','','');
And a .dump piece from CorpData looks like this:
INSERT INTO "CorpData" VALUES(NULL,0,'1000158','Shapeset',' S',' N',' 500005','
XYZ Consortium',' 20','6','7','1','5','0');
I have 4 tables - 3 of which I want to draw data from to increase a counter under certain conditions and then add this counter as a new column to the 4th. This 4th table, let's call it CorpData
(which I want to add more data to) currently looks like this and generally has between 10-50 rows (note that I'm using commas to show column separators):
CorpID, CorpName, Size, Type, PlantCount, OtherCounter1, OtherCounter2, OtherCounter3, OtherCounter4, OtherCounter5
100002, Widgit Corp, G, R, 25, 1, 5, 4, 3, 0
100004, ACME Corp, G, S, 15, 15, 4, 25, 28, 1
The notable pieces are CorpID (a unique key), and PlantCount which is a counter for how many plants (i.e. facilities) this corporation has.
The 1st of these additional data source tables, let's call it OperationData
has data like this and has about 50 rows:
OperationID, OperationName, Description
1, Warehouse, This facility stores items
2, Distribution Center, Items are brought her from Warehouses to be distributed
3, Factory, Goods are manufactured here
The 2nd, ServiceData
has around 700 rows and looks something like this:
OperationID, ServiceID
1, 4
1, 25
1, 33
1, 105
1, 19505
1, 32590
2, 4
2, 25
2, 55
2, 199
2, 19505
2, 335679
2, 529934
3, 2
3, 105
3, 55
3, 170
3, 48907
Each ServiceID is explained in yet another table, but I want to search for one or two ServiceIDs that I will specify like 4 and 55.
The last of the data tables of note, let's call it PlantData
, has details for all the plants for all the corporations so it has around 5200 rows and looks like this:
PlantID, CorpID, CityID, CountryID, PlantName
60000004, 100002, 74900, 34590, Somewhereville 123 - Widgit Corp Warehouse
60000007, 100002, 74878, 34590, Anytown 334 - Unit 3 - Widgit Corp Distribution Center
60000023, 100002, 56799, 23487, Quietville 532 - Unit 4 - Widgit Corp Warehouse
60000027, 100004, 74900, 34590, Somewhereville 544 - Unit 3 - ACME Corp Distribution Center
60000150, 100004, 56799, 23487, Quietville 312 - Unit 2 - ACME Corp Factory
60000155, 100004, 56799, 23487, Quietville 312 - Unit 4 - ACME Corp Warehouse
Note the following: 1) CorpID in this table matches CorpID in my starting table 2) The CorpName for a given CorpID will always appear in the PlantName 3) The PlantName also contains one OperationName 4) One CityID can have multiple corporation's plants as well as multiple of the same corporation's plants. 4) As a sidenote, this is just a small piece of this table and if you counted all the times that a given CorpID shows up in this table it would be the same as the PlantCount for that CorpID (so this can be used as a check of some sort to make sure no plants were missed.)
I want to add two new columns to CorpData
table, both of them will be counts - the first will be a count of how many plants that corp has with the ServiceID 4 and the second a count of how many plants that corp has with ServiceID 55. To do this I need to look over the long PlantData
table, parse out the OperationName (from the OperationData
table) from each PlantName, check to find that OperationName's corresponding OperationID (in the OperationData
table), and see if that OperationID is listed with the ServiceID (from the ServiceData
table) in question (4 in the first case and 55 in the 2nd.)
I intend to do this using sqlite3 with my 4 tables stored in a .db file, but I might be open to other options if you can make a solid case why I should use that option over sqlite3.
My end goal, given the examples here, would have the CorpData
table looking like this:
CorpID, CorpName, Size, Type, PlantCount, OtherCounter1, OtherCounter2, OtherCounter3, OtherCounter4, OtherCounter5, OperationID4Counter, OperationID55Counter
100002, Widgit Corp, G, R, 25, 1, 5, 4, 3, 0, 3, 1
100004, ACME Corp, G, S, 15, 15, 4, 25, 28, 1, 2, 2
This is because Widgit Corp has two Warehouses and a Distribution Center, Warehouses and Distribution Centers both have OperationID 4 but only a Distribution Center and a Factory have OperationID 55 and ACME Corp has 1 each of Factory, Warehouse, and Distribution Center and Factories do not have OperationID 4 but they do have OperationID 55.
Here's some things that I think might make this trickier:
Unit X -
part. OperationName is not always just one word long and CorpName can also be more (or less) than two words. Thus finding OperationName within PlantName will probably have to look over the whole thing rather than trying to split it into pieces somehow to search over just the piece that likely contains the OperationName.PlantData
table and count matches we have to keep checking two other tables to see if a row in PlantData
should be counted or not. I'm concerned that if this code is not built properly that it might end up being very slow.Upvotes: 4
Views: 549
Reputation: 180290
UPDATE CorpData
SET OperationID4Counter =
(SELECT COUNT(*)
FROM PlantData JOIN OperationData
ON PlantName LIKE '%' || OperationName
WHERE OperationData.OperationID IN (SELECT OperationID
FROM ServiceData
WHERE ServiceID = 4)
AND CorpID = CorpData.CorpID)
Upvotes: 1