Efficiently querying a graph structure

Question

I have a database which consists of a graph. The table I need to access looks like this:

Sno    Source    Dest
1       'jack'   'bob'
2       'jack'   'Jill'
3       'bob'    'Jim'

Here Sno is the primary key. Source and Destination are 2 non-unique numbers which represents an edge between nodes in my graph. My Source and Dest may also be strings and not necessarily an number data type. I have around 5 million entries in my database and I have built it using Postgresql with Psycopg2 for python.

It is very easy and quick to query for the primary key. However, I need to frequently query this database for all the dest a particular source is connected to. Right now I achieve this by calling the query:

SELECT * FROM name_table WHERE Source = 'jack'

This turns out to be quite inefficient (Up to 2 seconds per query) and there is no way that I can make this the primary key as it is not unique. Is there any way that I can make an index based on these repeated values and query it quickly?

fdomig · Accepted Answer

This should make your query much faster.

CREATE INDEX table_name_index_source ON table_name Source;

However there are many options which you can use

PostgreSQL Documentation

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table [ USING method ]
    ( { column | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
    [ WITH ( storage_parameter = value [, ... ] ) ]
    [ TABLESPACE tablespace ]
    [ WHERE predicate ]

Read more about indexing with PostgreSQL in their Documentation.

Update

If your table is that small as yours, this will for help for sure. However if your dataset is growing you should probably consider a schema change to have unique values which can be indexed more efficiently.

Efficiently querying a graph structure

Answers (1)

Related Questions