Reputation: 14641
I have a large table (+- 1 million rows, 7 columns including the primary key). The table contains two columns (ie: symbol_01
and symbol_02
) that are indexed and used for querying. This table contains rows such as:
id symbol_01 symbol_02 value_01 value_02
1 aaa bbb 12 15
2 bbb aaa 12 15
3 ccc ddd 20 50
4 ddd ccc 20 50
As per the example rows 1 and 2 are identical except that symbol_01
and symbol_02
are swapped but they have the same values for value_01
and value_02
. That is true once again with row 3 and 4. This is the case for the entire table, there are essentially two rows for each combination of symbol_01
+symbol_02
.
I need to figure out a better way of handling this to get rid of the duplication. So far the solution I am considering is to just have one column called symbol
which would be a combination of the two symbols, so the table would be as follows:
id symbol value_01 value_02
1 ,aaa,bbb, 12 15
2 ,ccc,ddd, 20 50
This would cut the number of rows in half. As a side note, every value in the symbol
column will be unique. Results always need to be queried for using both symbols, so I would do:
select value_01, value_02
from my_table
where symbol like '%,aaa,%' and symbol like '%,bbb,%'
This would work but my question is around performance. This is still going to be a big table (and will get bigger soon). So my question is, is this the best solution for this scenario given that symbol
will be indexed, every symbol
combination will be unique, and I will need to use LIKE
to query results.
Is there a better way to do this? Im not sure how great LIKE
is for performance but I don't see an alternative?
Upvotes: 2
Views: 42
Reputation: 125244
With this index:
create index symbol_index on t (
least(symbol_01, symbol_02),
greatest(symbol_01, symbol_02)
)
The query would be:
select *
from t
where
least(symbol_01, symbol_02) = least('aaa', 'bbb')
and
greatest(symbol_01, symbol_02) = greatest('aaa', 'bbb')
Or simply delete the duplicates:
delete from t
using (
select distinct on (
greatest(symbol_01, symbol_02),
least(symbol_01, symbol_02),
value_01, value_02
) id
from t
order by
greatest(symbol_01, symbol_02),
least(symbol_01, symbol_02),
value_01, value_02
) s
where id = s.id
Depending on the columns semantics it might be better to normalize the table as suggested by @Bohemian
Upvotes: 1
Reputation: 425033
There's no high performance solution, because your problem is shoehorning multiple values into one column.
Create a child table (with a foreign key to your current/main table) to separately hold all the individual values you want to search on, index that column and your query will be simple and fast.
Upvotes: 2