Reputation: 1958
I'm trying to understand if there are any standard best practice approaches for modelling number ranges in a Relational Database (In this case MySQL) and if this is in fact a sensible thing to do.
I shall explain the task that prompted the question for context.
I'm currently in the process of designing a database which will model the allocation of a pool of Identifiers to Customers.
The pool of potential Identifiers has a range from 0 to about 2^30
A given customer could be allocated any number of Identifiers from a single Identifier to millions in multiple contiguous blocks.
A given Identifier may only be allocated to a single customer (i.e. it is a one to many relationship)
Clearly there will be a Customer table and and an Identifier table containing the Customer key.
The complexity comes with how to model the Identifiers:
Option one would be to have a row represent single identifier. This will result in a potentially huge number of rows in the table, but would make searching for who owns which identifier and if a given identifier is in use trivial.
The second (and I think more promising) option would be to have a row represent a range of values with a minimum and maximum value. This would make queries a bit more complex (I'm assuming the query for checking if an identifier was in use would be to query for ranges with "Minimum lower than X" and a "Maximum higher than X") but would result in far fewer rows and would likely be easier to manage and update.
I would welcome any views on if this is a good approach and if not if there is an obvious better approach that I am missing.
Upvotes: 14
Views: 13374
Reputation: 461
One possibility is to use a regular expression to represent the pool of identifiers, casting between strings and numbers as needed. The problem here is to find a regular expression for a given list of identifiers. This might be automated using the Aho–Corasick algorithm. This is only practical if these pools of IDs mostly look the same. Obviously if they are randomly assigned, then its going to be hard to find a regular expression much better than a long list of ORd literals.
Upvotes: 0
Reputation: 21
If I understand you correctly you're needing to work with multiple ranges, which could get tricky. You might want to look at PostgreSQL 9.2 range types. They look relevant to what you're trying to do.
In the real world ranges can overlap, contain each other or not overlap, and they can be open or closed, making range-checking queries potentially complex and error-prone. Range types remove most of this complexity and they're supported natively by indexing.
https://wiki.postgresql.org/images/7/73/Range-types-pgopen-2012.pdf
Best wishes,
Nick
Upvotes: 2
Reputation: 29629
Normally, I wouldn't try to reduce the number of rows just for the sake of it - in principle, a well-indexed table with a billion rows should be just as quick as a table with 100 rows, as long as your queries hit the index.
I'd work some more on the actual queries you are likely to want to run, and design the solution on that basis. For instance, would you want to list all the IDs that belong to a single customer? Would you want to check which customer owns several IDs? Would you want to find how many IDs a customer owns?
The latter is a little tricky if you have "range" tables - instead of doing "select count(*) from ranges where customer = 1"
, you'd have to calculate the number IPs in each range for the customer, and add them up. Not rocket science, but might be slower in the real world...
Upvotes: 1
Reputation: 76641
If you make a table like so
table ids
id_start not null unsigned integer /*not autoincrement!*/
id_end not null unsigned integer
customer_id unsigned integer not null
foreign key FK_customer (customer_id) REFERENCES customer.id
primary key (id_start, id_end)
key id_end (id_end)
Now you can simply check for a free key by doing
SELECT count(*) as occupied FROM ids
WHERE 100 between id_start and id_end;
To check a free range do
SELECT count(*) as occupied FROM ids
WHERE NOT ('$low' > id_end) AND NOT ('$high' < id_start)
Upvotes: 0
Reputation: 425491
If the ranges do not intersect, then you may store them as pairs of INT
values:
CREATE TABLE customer_range
(
customerId INT,
rgStart INT,
rgEnd INT,
PRIMARY KEY (customerId, rgStart),
UNIQUE KEY (rgStart)
)
To query the customer a number belongs to, use this:
SELECT customerId
FROM customer_range
WHERE rgStart <= $mynum
AND rgEnd >= $mynum
ORDER BY
rgStart DESC
LIMIT 1
Upvotes: 9