Reputation: 33
I'm trying to translate a R code in SQL Query.
tab1 <- merge(tab1, tab1, by='col1', allow.cartesian = T)
tab1 <- tab1[col2.x != col2.y,]
I have to do a cartesian product on same table, based on key col1 column.
This is my situation:
col1 | col2
a | b
a | c
a | d
c | e
c | f
g | i
h | j
and I'm trying to get something like this:
col1 | col2 | col3
a | b | c
a | b | d
a | c | b
a | c | d
a | d | b
a | d | c
c | e | f
c | f | e
so I did a query like this:
SELECT a.*, b.col2 as col3
FROM tab1 a
JOIN tab1 b on
a.col1 = b.col1 AND
a.col2 <> b.col2
The query works, but it's really really slow compared to R. The result it's correct but with col3 different order:
col1 | col2 | col3
a | b | d
a | b | c
a | c | d
a | c | b
a | d | c
a | d | b
c | e | f
c | f | e
Is this the only way to do this? And also there is a way to increment the performarces? I have a table with like 2M rows, and with the cartesian I get like 20M rows.
In R it takes like 1 or 2 minutes to get the result, on PostgreSQL it takes like 10-15 minutes.
Thank you so much.
Upvotes: 0
Views: 966
Reputation: 33
Following the @jlandercy and @Edouard suggestions, I solved the problem creating a MATERIALIZED VIEW and an INDEX with the columns.
CREATE MATERIALIZED VIEW tab1_mat AS
SELECT *
FROM tab1
CREATE INDEX IF NOT EXISTS col1_col2 ON tab1 USING BTREE (col1, col2) ;
After this, I'm able to do my query:
SELECT a.*, b.col2 as col3
FROM tab1_mat a
JOIN tab1_mat b on
a.col1 = b.col1 AND
a.col2 <> b.col2
Then I scheduled a job that every night updates my materialized view using this command.
REFRESH MATERIALIZED VIEW tab1_mat;
Upvotes: 0
Reputation: 7075
"The result it's correct but with col3 different order"
To get the right order, you have to add the ORDER BY 1,2,3
clause to your self-joined query, but this is going to make it even slower ...
"there is a way to increment the performances ?"
Yes if you don't have yet created the following index, it should dramatically accelerate your self-joined query :
CREATE INDEX IF NOT EXISTS col1_col2 ON tab1 USING BTREE (col1, col2) ;
Upvotes: 1