user3830784
user3830784

Reputation: 355

Creating the optimum index for my database

I have a table in postgresql with the following information:

rawData (fileID integer references otherTable, lineNum integer, data1 double, ...)

When I am searching this table, I do so with the following query:

SELECT lineNum, data1, ...other data FROM rawData WHERE 
fileID = ? AND data1 < ? ORDER BY lineNum;

In general, the data in this table is a number of entries for each fileID, and each fileID has lineNum from 0 to x, with lineNum never repeating for each fileID (but it does repeat for different fileID's). Then data1 is effectively a random number that may or may not overlap.

In order to speed up the reading of this data, I am trying to create an index on it, but am having trouble figuring out the best way to index it. Currently I am looking at one of the following two index methods, and am wondering which would be better for my search, or if there is another option that I haven't thought of that would be better than either of them.

index idea 1:

CREATE INDEX searchIndex ON rawData (fileID, data1, lineNum);

index idea 2:

CREATE INDEX searchIndex ON rawData (fileID, lineNum, data1);

Note that at this time, this and a search not constrained by data1 are the only searches that I run on this table, so I'm not too concerned about this index slowing down other searches.

Lastly, would I have to change my search query to use the index, or would it automatically use that index when I search the table?

Upvotes: 0

Views: 39

Answers (2)

jjanes
jjanes

Reputation: 44137

Either index can be used. Which is faster will depend on many things, like how many rows are in the table, how many lineNum there are per fileID, how selective the data1 < ? clause is, what your hardware is, what our config settings are, which version of PostreSQL you are using, what physical order the table rows lie in, etc.

The only way to know for sure is to try it with your own data on your own system and see.

I'd just build an index on (fileID, lineNum, data1), or even just (fileID, lineNum), because that seems more natural, and then forget about it. Most likely it will be fast enough. Once there is a demonstrable performance problem, than you will have the test case at hand which is needed to come to a real conclusion.

Upvotes: 0

Robins Tharakan
Robins Tharakan

Reputation: 2473

You should look at using this instead:

CREATE INDEX searchIndex ON rawData (fileID, lineNum);

A few things:

  • In particular, as per docs, Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized.

  • Since your second search query requires filtering without the data1 column, keeping the second column lineNum should be sufficient (since you mention it would be quasi-random), and in the rare occurrence that there are repeats, table fetches should ensure correctness. But what this would mean is that the Index would be 1/3rd smaller in size, which is a big win (Think index small-enough to be in memory / index-only-scans etc.)

Upvotes: 1

Related Questions