Qanthelas
Qanthelas

Reputation: 524

Counting by Pulling Data From Multiple Tables Using Multiple Keys with sqlite3

The Answer

CL.'s answer does the trick! I ended up using a Python script (which can be viewed in the next section down called "Updates: Working Towards an Answer") and once I set up my database properly so that the ID columns were set as integer keys (or, if that wasn't possible, numeric) and the Name columns were set as text then it worked!

Updates: Working Towards an Answer

I tried running a .py file that looks like this:

import sqlite3
conn = sqlite3.connect('data.db')
c = conn.cursor()

c.executescript("""
UPDATE CorpData
SET OperationID4Counter =
      (SELECT COUNT(*)
       FROM PlantData JOIN OperationData
                      ON PlantName LIKE '%' || OperationName
       WHERE OperationID IN (SELECT OperationID
                             FROM ServiceData
                             WHERE ServiceID = 512)
         AND CorpID = CorpData.CorpID)
        """)

and get this error: sqlite3.OperationalError: ambiguous column name: OperationID. I'm guessing this is because we have joined PlantData and OperationData, both of which have a column named OperationID. When I change that line of code to read WHERE OperationData.OperationID IN (SELECT OperationID or WHERE PlantData.OperationID IN (SELECT OperationID, it runs but I end up with a zero in all rows of my CorpData table under the OperationID4Counter column.

I think we're close, but no cigar. I think something is not right with the ON StationName LIKE '%' || OperationName line because when I change it to ON StationName LIKE '%house' (which should, if I understand this properly, get all that end in 'house' which would include Warehouse) I still end up with all zeroes for the OperationID4Counter (even though it should at least be counting Warehouses which do have an OperationID4.)

CL. asked for some .dump information to see what types are being used in this database. I have not specified anything so it has just been using defaults. Also note that the various tables have more columns than the ones I showed in my examples (but also note that these columns are not relevant for this question as they deal with data not related to the question at hand.) For example, one piece of the .dump for the PlantData table looks like this:

INSERT INTO "PlantData" VALUES('60015145','0','0','50000000','10000','15','386
8','1000181','30003830','20000560','10000048','Anytown 334 - Unit 3 - Widgit Corp Logistics Center','-1.444E+12','-71312793600','-9.25528E+11','0.5','0.025','4
');

A .dump piece from OperationData looks like this:

INSERT INTO "OperationData" VALUES('20','45','Manufacturing','','0','0','0','0',
'0','','','','','');

And a .dump piece from CorpData looks like this:

INSERT INTO "CorpData" VALUES(NULL,0,'1000158','Shapeset',' S',' N',' 500005','
XYZ Consortium',' 20','6','7','1','5','0');

The Background & Data Samples

I have 4 tables - 3 of which I want to draw data from to increase a counter under certain conditions and then add this counter as a new column to the 4th. This 4th table, let's call it CorpData (which I want to add more data to) currently looks like this and generally has between 10-50 rows (note that I'm using commas to show column separators):

CorpID, CorpName, Size, Type, PlantCount, OtherCounter1, OtherCounter2, OtherCounter3, OtherCounter4, OtherCounter5
100002, Widgit Corp, G, R, 25, 1, 5, 4, 3, 0
100004, ACME Corp, G, S, 15, 15, 4, 25, 28, 1

The notable pieces are CorpID (a unique key), and PlantCount which is a counter for how many plants (i.e. facilities) this corporation has.

The 1st of these additional data source tables, let's call it OperationData has data like this and has about 50 rows:

OperationID, OperationName, Description
1, Warehouse, This facility stores items
2, Distribution Center, Items are brought her from Warehouses to be distributed
3, Factory, Goods are manufactured here

The 2nd, ServiceData has around 700 rows and looks something like this:

OperationID, ServiceID
1, 4
1, 25
1, 33
1, 105
1, 19505
1, 32590
2, 4
2, 25
2, 55
2, 199
2, 19505
2, 335679
2, 529934
3, 2
3, 105
3, 55
3, 170
3, 48907

Each ServiceID is explained in yet another table, but I want to search for one or two ServiceIDs that I will specify like 4 and 55.

The last of the data tables of note, let's call it PlantData, has details for all the plants for all the corporations so it has around 5200 rows and looks like this:

PlantID, CorpID, CityID, CountryID, PlantName
60000004, 100002, 74900, 34590, Somewhereville 123 - Widgit Corp Warehouse
60000007, 100002, 74878, 34590, Anytown 334 - Unit 3 - Widgit Corp Distribution Center
60000023, 100002, 56799, 23487, Quietville 532 - Unit 4 - Widgit Corp Warehouse
60000027, 100004, 74900, 34590, Somewhereville 544 - Unit 3 - ACME Corp Distribution Center
60000150, 100004, 56799, 23487, Quietville 312 - Unit 2 - ACME Corp Factory
60000155, 100004, 56799, 23487, Quietville 312 - Unit 4 - ACME Corp Warehouse

Note the following: 1) CorpID in this table matches CorpID in my starting table 2) The CorpName for a given CorpID will always appear in the PlantName 3) The PlantName also contains one OperationName 4) One CityID can have multiple corporation's plants as well as multiple of the same corporation's plants. 4) As a sidenote, this is just a small piece of this table and if you counted all the times that a given CorpID shows up in this table it would be the same as the PlantCount for that CorpID (so this can be used as a check of some sort to make sure no plants were missed.)

The Question

I want to add two new columns to CorpData table, both of them will be counts - the first will be a count of how many plants that corp has with the ServiceID 4 and the second a count of how many plants that corp has with ServiceID 55. To do this I need to look over the long PlantData table, parse out the OperationName (from the OperationData table) from each PlantName, check to find that OperationName's corresponding OperationID (in the OperationData table), and see if that OperationID is listed with the ServiceID (from the ServiceData table) in question (4 in the first case and 55 in the 2nd.)

I intend to do this using sqlite3 with my 4 tables stored in a .db file, but I might be open to other options if you can make a solid case why I should use that option over sqlite3.

The Goal

My end goal, given the examples here, would have the CorpData table looking like this:

CorpID, CorpName, Size, Type, PlantCount, OtherCounter1, OtherCounter2, OtherCounter3, OtherCounter4, OtherCounter5, OperationID4Counter, OperationID55Counter
100002, Widgit Corp, G, R, 25, 1, 5, 4, 3, 0, 3, 1
100004, ACME Corp, G, S, 15, 15, 4, 25, 28, 1, 2, 2

This is because Widgit Corp has two Warehouses and a Distribution Center, Warehouses and Distribution Centers both have OperationID 4 but only a Distribution Center and a Factory have OperationID 55 and ACME Corp has 1 each of Factory, Warehouse, and Distribution Center and Factories do not have OperationID 4 but they do have OperationID 55.

Other Notes

Here's some things that I think might make this trickier:

  1. PlantName contains several words and it may or may not have the Unit X - part. OperationName is not always just one word long and CorpName can also be more (or less) than two words. Thus finding OperationName within PlantName will probably have to look over the whole thing rather than trying to split it into pieces somehow to search over just the piece that likely contains the OperationName.
  2. As we work our way down the PlantData table and count matches we have to keep checking two other tables to see if a row in PlantData should be counted or not. I'm concerned that if this code is not built properly that it might end up being very slow.
  3. It's tricky for me at least to wrap my mind around all this so it might be easy to refer to the wrong table or miss a step in the search from things like OperationName to corresponding OperationID.

Upvotes: 4

Views: 549

Answers (1)

CL.
CL.

Reputation: 180290

UPDATE CorpData
SET OperationID4Counter =
      (SELECT COUNT(*)
       FROM PlantData JOIN OperationData
                      ON PlantName LIKE '%' || OperationName
       WHERE OperationData.OperationID IN (SELECT OperationID
                                           FROM ServiceData
                                           WHERE ServiceID = 4)
         AND CorpID = CorpData.CorpID)

Upvotes: 1

Related Questions