أحمد صوالحة
أحمد صوالحة

Reputation: 323

performance for sum oracle

I have to sum a huge number of data with aggregation and where clause, using this query what I am doing is like this : I have three tables one contains terms the second contains user terms , and the third contains correlation factor between term and user term. I want to calculate the similarity between the sentence that that user inserted with an already existing sentences, and take the results greater than .5 by summing the correlation factor between sentences' terms The problem is that this query takes more than 15 min. because I have huge tables any suggestions to improve performance please?

     insert into PLAG_SENTENCE_SIMILARITY
     SELECT  plag_TERMS.SENTENCE_ID ,plag_User_TERMS.SENTENCE_ID,
      least(  sum( plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_terms.sentence_length,
      sum (plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_user_terms.sentence_length),
      plag_TERMs.isn,
      plag_user_terms.isn 
    FROM plag_TERM_CORRELATIONS3,
    plag_TERMS,
    Plag_User_TERMS
   WHERE ( Plag_TERMS.TERM_ROOT  = Plag_TERM_CORRELATIONS3.TERM1
   AND Plag_User_TERMS.TERM_ROOT = Plag_TERM_CORRELATIONS3.TERM2 
   AND Plag_User_Terms.ISN=123)
   having 
    least(  sum( plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_terms.sentence_length,
      sum (plag_TERM_CORRELATIONS3.CORRELATION_FACTOR)/ plag_user_terms.sentence_length) >0.5
      group by (plag_User_TERMS.SENTENCE_ID,plag_TERMS.SENTENCE_ID ,  plag_TERMs.isn,       plag_terms.sentence_length,plag_user_terms.sentence_length, plag_user_terms.isn);

plag_terms contains more than 50 million records and plag_correlations3 contains 500000

Upvotes: 0

Views: 2024

Answers (1)

peter.hrasko.sk
peter.hrasko.sk

Reputation: 4141

If you have a sufficient amount of free disk space, then create a materialized view

  • over the join of the three tables
  • fast-refreshable on commit (don't use the ANSI join syntax here, even if tempted to do so, or the mview won't be fast-refreshable ... a strange bug in Oracle)
  • with query rewrite enabled
  • properly physically organized for quick calculations

The query rewrite is optional. If you can modify the above insert-select, then you can just select from the materialized view instead of selecting from the join of the three tables.

As for the physical organization, consider

  • hash partitioning by Plag_User_Terms.ISN (with a sufficiently high number of partitions; don't hesitate to partition your table with e.g. 1024 partitions, if it seems reasonable) if you want to do a bulk calculation over all values of ISN
  • single-table hash clustering by Plag_User_Terms.ISN if you want to retain your calculation over a single ISN

If you don't have a spare disk space, then just hint your query to

  • either use nested loops joins, since the number of rows processed seems to be quite low (assumed by the estimations in the execution plan)
  • or full-scan the plag_correlations3 table in parallel

Bottom line: Constrain your tables with foreign keys, check constraints, not-null constraints, unique constraints, everything! Because Oracle optimizer is capable of using most of these informations to its advantage, as are the people who tune SQL queries.

Upvotes: 1

Related Questions