Reputation: 1029
I have a database which consists of a graph. The table I need to access looks like this:
Sno Source Dest
1 'jack' 'bob'
2 'jack' 'Jill'
3 'bob' 'Jim'
Here Sno
is the primary key. Source and Destination are 2 non-unique numbers which represents an edge between nodes in my graph. My Source
and Dest
may also be strings and not necessarily an number data type. I have around 5 million entries in my database and I have built it using Postgresql with Psycopg2 for python.
It is very easy and quick to query for the primary key. However, I need to frequently query this database for all the dest
a particular source is connected to. Right now I achieve this by calling the query:
SELECT * FROM name_table WHERE Source = 'jack'
This turns out to be quite inefficient (Up to 2 seconds per query) and there is no way that I can make this the primary key as it is not unique. Is there any way that I can make an index based on these repeated values and query it quickly?
Upvotes: 2
Views: 115
Reputation: 4457
This should make your query much faster.
CREATE INDEX table_name_index_source ON table_name Source;
However there are many options which you can use
PostgreSQL Documentation
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table [ USING method ]
( { column | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace ]
[ WHERE predicate ]
Read more about indexing with PostgreSQL in their Documentation.
Update
If your table is that small as yours, this will for help for sure. However if your dataset is growing you should probably consider a schema change to have unique values which can be indexed more efficiently.
Upvotes: 4