myermian
myermian

Reputation: 32525

Database Search on unique column: Long string or many int's or long's?

I'm not sure what Database I will be going with (more likely SQL Server Express), so I don't know if that makes a difference (or that much of a difference) to matter.

Basically I am looking to store my object in a database so I can search for a unique object.

public class FooBar
{
    public GridItem[,]  Items { get; set; } //This is a 5x4 grid
}

public enum GridItem
{
    a = 0,
    b,
    c
}

At first I represented each GridItem as a 2 character binary (A = 00, B = 01, C = 10 -- I don't think this bogged my application that much building the string from the array) which gave me a 40 character string. I can search for this string in the database to match, but it got me thinking. Is it more efficient to leave each GridItem as an Int32 (or Int64) and search the database to see if ALL columns (GItem00, GItem01, ... GItem54) matches up to their appropriate row/column GridItem. I think the Int32 vs Int64 would probably have to do with the processor, so that's not as big of a deal. Basically, if speed is my #1 concern (not storage) which is better... spit out an 80 character string or store 20 different Int32's into the Database and search on those columns?

Or, is there something even better, such as serializing the object into a binary and somehow being able to search for a matching blob? I'm not really a database guy, so I have no idea.

Upvotes: 1

Views: 367

Answers (3)

Pavel Urbančík
Pavel Urbančík

Reputation: 1496

If I understand your question correctly, you want to match entire instances of FooBar (or binary representation of it) in database? 5x4 grid = 20 items, 2 bit each = 40bit = 5 bytes => Int64 column. You can't get anything faster satisfying your requirements.

Upvotes: 0

Mike Lue
Mike Lue

Reputation: 839

I have not faced such issue before, but I have some theories about better speed.

When system persist data as 40-byte characters and there is a index on it, the index would be as short as enough to distinguish the exactly record poisition of data. For example:

0101101.... => 010(3-byte index)
0111111.... => 011(3-byte index)

In another way, when system persist data as 8-byte(Int64) integer and there is a index on it, the index should be exactly 8 bytes per record.

In generic database theory, the less storage used, the more query performance gained.

If your data is much enough that database need all of the characters(40-byte character) to index the record, the size of index would be 40-byte on some records. And the 8-byte integer index, as explained, still stay in 8 bytes however data grown.

There is a precondition in above theory: the matched data should only occupy as a small part of all.

There is a significant factor to concern for effort of index maintenance: You need 20 indexes(logically) to speed up the 20 Int32's strategy. There is only one index, indeed, needed for 80-character strategy and for single Int64 strategy.


Let's explain if the index doesn't work, which means the database system execute query using full-table-scan(FTS) strategy.

We assume the 40-byte(character) data is persisted as 40 bytes per record, every page in SQL Server can hold 8K * 1024 / 40 = 204 records.

For 8-byte(Int64) data with 8 bytes per record, every page in SQL Server can hold 8K * 1024 / 8 = 1024 records.

If you have 20000 records, database need 20000 / 204 = 99 I/O's to perform FTS, and 20000 / 1024 = 20 I/O's for the other one.

The less I/O's needed, the more performance gained.

Upvotes: 1

MrFox
MrFox

Reputation: 5136

Enums are not very usefull for this, if you know which index number you want, simply access the data there. Also after Foo[,] you should specify variable name, you cannot use the enum name there.

Upvotes: 0

Related Questions