Sean Hanley
Sean Hanley

Reputation: 5737

How do I best store a list of numbers in a relational database?

I want to store a list of numbers (essentially, a set in mathematical terms) in a relational database, specifically SQL Server 2005.

Ideally, I'd like it to be a single column on a given table, but I'm willing to hear any sort of solution. The data I need to store is, like I said, a set of numbers.

This column is meant to store all of the "step numbers" of a given process that the row applies to. Each row can, therefor, apply to one or more steps, in any order, range, or sequence. The maximum number of steps possible (the max range, essentially) is different from row to row, though I highly doubt any of them will get into the hundreds, so in 99.9% of cases the maximum should never exceed 20 or 30, and I'd be surprised if it ever got anywhere close to 100. Each row is guaranteed to have one value (step) at minimum (i.e. it doesn't make sense to have a row that doesn't apply to any step), but I figure this is as simple as setting the column to not null.

However it is stored, I'd like it to be easily searched. For instance, I'd rather not have to jump through a lot of hoops to write an SQL query to find all rows that apply to "step 3" for instance. If a given row has several steps it applies to (say, 2, 3, 7, and 8), it shouldn't be too difficult to match it when searching by step 3.

Also, while I'd like it to make some sort of logical sense when looking at the raw data (for anyone that need work on the system after I'm not around to ask and so they don't have to read thick documentation to figure out my obscure encoding), I'm willing to compromise on this. Encoding the list into something that can be reliably decoded is, thus, acceptable.

I apologize if this is a dupe — I've been googling around but I suspect this issue of mine suffers from not knowing what to search for or how to phrase or call it to find what I'm looking for.

On a more commentary note, I wonder if this isn't one of those areas where relational databases fall short. Unfortunately, I don't have a choice here. I must store it in SQL Server. Saving separately to a file or some other persistent data storage is out of the question, I'm afraid.

Upvotes: 9

Views: 7833

Answers (7)

Sean Hanley
Sean Hanley

Reputation: 5737

Ended up using a solution to a similar question.

Thanks anyway, though! I enjoy reading everyone's opinions on these esoteric areas of database design.

Upvotes: 1

polyglot
polyglot

Reputation: 2058

create table setmember (setid int, setmemberid int)
create unique nonclustered index idx_setmember_idx1 on setmember (setid, setmemberid)

Let me assume a magic number (-1 or 999999999) for "all".

This will be highly performant for both querying on a per set basis, and update insertion via the nonclustered index. Uniqueness enforces no repetition of entries. It is problematic to enforce either "all" or else multiple set members as a constraint, but there are diminishing returns though it could be done in a trigger.

Additionally add

create nonclustered index idx_setmember_idx2 on setmember (setmemberid, setid)

to allow for efficient reverse lookup queries.

If you use array types, you may be unable to implement efficient reverse lookup.

Note all the SQL above is ANSI compliant.

Upvotes: 1

rfusca
rfusca

Reputation: 7705

If you aren't tied to SQL Server, Postgresql has great support for this kind of thing via an array. They even have a special value for infinity.

If you're tied to SQL Server,MitMaro's way is best.

Upvotes: 1

Jeffrey Shackelford
Jeffrey Shackelford

Reputation: 191

The answer below to do a subtable (MitMaro) is the "standard" way.

IF you MUST put a set of numbers into one column or a table though the only way I can imagine is to use bitwise operations to store the set and you can use bitwise operations in you data queries to look for specific bits being set. Quick google searching indicates MSSql 2005 supports this but only up to 32-bit int, so if you steps pass 32 you will encounter issues.

All in all, the subtable is the most standard it would make for somewhat more understandable queries against the table(s). This is also the safest to support any future case where you would ned larger than 32 value maps.

Upvotes: 1

Wojciech Bederski
Wojciech Bederski

Reputation: 3922

And why additional table steps(processID JOIN, step INT) is not an option? I'm pretty sure it would be the easiest to maintain/code.

SELECT process.name FROM process, steps WHERE process.id = steps.processID AND steps.step = 3;

Pardon my SQL, but it's been a while :)

EDIT: UNIQUE(processID, step) would be advisable.

Upvotes: 2

MitMaro
MitMaro

Reputation: 5937

I can't remember the correct terminology for this but the correct way to do this would be to create a table like the one below:

|  id  |  table1_id  |  value  |
--------------------------------
|   0  |          1  |      1  |
|   1  |          1  |      2  |
|   2  |          1  |      3  |
|   3  |          1  |      7  |
|   4  |          1  |      9  |
|   5  |          2  |      1  |
|   6  |          2  |      3  |
| ...  |        ...  |    ...  |

For each value in table1 you add the required values into this table.

For 'all' you can create a column in table1 which is a flag you can set if you want all. (I use 'enum' in MySql but I am not sure if this exists in SQL Server).

I am not sure if there is some Sql Server specific way of doing this since I use mostly MySql.

Upvotes: 4

ttarchala
ttarchala

Reputation: 4577

I'd use a simple and canonical relational design: CREATE TABLE ranges (process_id int, num_low int, num_hi int). Latter two columns specify range. Independent index on each column. For "special" infinity values just use maxints or additional boolean columns.

Advantages: easy search for whether a particular numer is in range, or whether ranges intersect. Easy maintenance. General understandability and simplicity.

Disadvantages: some logic required when modifying the set, i.e. checking whether newly inserted or modified range intersects. Splicing ranges may be required.

Upvotes: 0

Related Questions