Reputation: 539
I have two tables which look like the following
tools:
id | part name
---------------
0 | hammer
1 | sickle
2 | axe
people:
personID | ownedTool1 | ownedTool2 | ownedTool3 ..... ownedTool20
------------------------------------------------------------------
0 | 2 | 1 | 3 ... ... 0
I'm trying to find out how many people own a particular tool. A person cannot own multiple copies of the same tool.
The only way I can think of doing this is something like
SELECT COUNT(*)
FROM tools JOIN people ON tools.id = people.ownedTool1.id OR tools.id = people.ownedTool2 ... and so on
WHERE tools.id = 0
to get the number of people who own hammers. I believe this will work, however, this involves having 20 OR
statements in the query. Surely there is a more appropriate way to form such a query and I'm interested to learn how to do this.
Upvotes: 1
Views: 1485
Reputation: 657777
You received your (warranted) lectures about the database design.
As to your question, there is a simple way:
SELECT count(*) AS person_ct
FROM tbl t
WHERE translate((t)::text, '()', ',,')
~~ ('%,' || @desired_tool_id::text || ',%')
Or, if the first column is person_id
and you want to exclude that one from the search:
SELECT count(*) AS person_ct
FROM tbl t
WHERE replace((t)::text, ')', ',')
~~ ('%,' || @desired_tool_id::text || ',%')
Every table is accompanied by a matching composite type in PostgreSQL. So you can query any table this way:
SELECT (tbl) FROM tbl;
Yields one column per row, holding the whole row.
PostgreSQL can cast such a row type to text in one fell swoop: (tbl)::text
I replace both parens ()
with a comma ,
so every value of the row is delimited by commas ,
.
My second query does not translate the opening parenthesis, so the first column (person_id
) is excluded from the search.
Now I can search all columns with a simple LIKE (~~
) expression using the desired number delimited by commas ~~ %,17,%
Voilá: all done with one simple command. This is reliable as long as you don't have columns like text
or int[]
in your table that could also hold ,17,
within their values, or additional columns with numbers, which could lead to false positives.
It won't deliver performance wonders as it cannot use standard indexes. (You could create a GiST or GIN index on an expression using the tgrm module in pg 9.1, but that's another story.)
Anyway, if you want to optimize, you'd better start by normalizing your table layout as has been suggested.
Upvotes: 2
Reputation:
If you cannot change the model (and I'm sure you will tell us that), then the only sensible way to work around this broken datamodel is to create a view that will give you a normalized view (pun intended) on the data:
create view normalized_people
as
select personid,
ownedTool1 as toolid
from people
union all
select personid,
ownedTool2 as toolid
from people
select personid,
ownedTool3 as toolid
from people
... you get the picture ...
Then your query is as simple as
select count(personid)
from normalized_people
where toolid = 0;
Upvotes: 2
Reputation: 708
You shouldn't have 20 columns each possibly containing an ID in the first place. You should properly establish a normalized schema. If a tool can belong to only one user - but a user can have multiple tools, you should establish a One to Many relationship. Each tool will have a user id in its row that maps back to the user it belongs to. If a tool can belong to one or more users you will need to establish a Many to Many relationship. This will require an intermediate table that contains rows of user_id to tool_id mappings. Having a schema set up appropriately like that will make the query you're looking to perform trivial.
In your particular case it seems like a user can have many tools and a tool can be "shared" by many users. For your many-to-many relation all you would have to do is count the number of rows in that intermediate table having your desired tool_id.
Something like this:
SELECT COUNT(ID) FROM UserTools Where ToolID = @desired_tool_id
Googling the terms I bolded should get you pointed in the correct direction. If you're stuck with that schema then the way you pointed out is the only way to do it.
Upvotes: 2