Lucas Ag
Lucas Ag

Reputation: 43

Most efficient way to SELECT on a cluster table?

In my program, I need to perform a select on the CDPOS cluster table. However, this process is very inefficient and, depending on the amount of data, it can take several minutes or simply hit a timeout. In a test I conducted, retrieving 35 IDs took 1 minute and 15 seconds, and in a second test with 90 IDs, the program timed out.

Current process: First, I perform a select on the EKKO table with purchase orders filtered by the date provided by the user. Then, I filter the IDs obtained from the EKKO table in the CDHDR table. The remaining IDs from the CDHDR table are then used in the select on the CDPOS table. Finally, I use a check to filter non-key fields. Note: I filter by IDs because the CDPOS table does not have date fields.

The SELECT in which I am having performance issues:

IF lt_chunk_cdhdr is not INITIAL.


    SELECT OBJECTID
           CHANGENR
           FNAME
           CHNGIND
           VALUE_NEW
           VALUE_OLD
      FROM CDPOS
      INTO TABLE lt_temp_cdpos
      FOR ALL ENTRIES IN lt_chunk_cdhdr
      WHERE OBJECTID = lt_chunk_cdhdr-objectid.

    LOOP AT lt_temp_cdpos INTO ls_cdpos.
      CHECK ls_cdpos-fname = 'FRGKE' AND ls_cdpos-value_new = '2'
         OR ls_cdpos-fname = 'MWSKZ'
         OR ls_cdpos-fname = 'KEY' AND ls_cdpos-chngind = 'I'
         OR ls_cdpos-fname = 'BRTWR'
         OR ls_cdpos-fname = 'NETWR'
         OR ls_cdpos-fname = 'ZTERM'
         OR ls_cdpos-fname = 'INCO1'.

      APPEND ls_cdpos TO lt_cdpos.
    ENDLOOP.
ENDIF.

I tried to improve the efficiency of the select using tips I found on the internet. One of these tips was to use only key fields in the WHERE clause when working with cluster tables and then use the CHECK command to filter the retrieved data again. However, even after following this advice, the efficiency is still insufficient. If the number of IDs to be filtered is close to 80, the program does not complete execution.

"Cluster table: the exact opposite occurs: due to the storage method of these tables, the WHERE clause should contain only the key fields, and the other fields should be checked using the CHECK command. The database cannot process cluster tables as it processes transparent tables. Forcing the database to decompress and check fields (in the case of a select with non-key fields in the WHERE clause) is less efficient, in most cases, than qualifying with key fields only and leaving the CHECK for non-key fields after they have been returned." Reference (in Portuguese): https://marcolin.wordpress.com/2010/04/08/grandes-viloes-da-performance-sap/

Another attempt I made was to limit the search period to 31 days. However, depending on the month and the number of purchase orders, this can still cause a timeout.

Upvotes: 1

Views: 992

Answers (3)

r04dRunErr
r04dRunErr

Reputation: 53

Sorry, i can't use comments yet. However, these are 2 cents for @Lucas Ag. Of course, it depends on the database, but generally a rule of thumb for good performance of a SQL query is to provide fields that are backed by an index, starting with the first field (most generic) and then going to the last (most detailed). That is, the point is not to just provide as many key fields as possible but in the correct order. That is, in the case of cluster tables, the only index you have is the primary key. Let's say, lt_chunk_cdhdr contains all change documents for a given class/object ID. Consider the below 2 queries, both of which refer to 2 key fields in the where clause:

    SELECT OBJECTID
       CHANGENR
       FNAME
       CHNGIND
       VALUE_NEW
       VALUE_OLD
    FROM CDPOS
    INTO TABLE lt_temp_cdpos
    FOR ALL ENTRIES IN lt_chunk_cdhdr
    WHERE OBJECTID = lt_chunk_cdhdr-objectid
    AND   OBJECTCLAS = lt_chunk_cdhdr-OBJECTCLAS.

    SELECT OBJECTID
       CHANGENR
       FNAME
       CHNGIND
       VALUE_NEW
       VALUE_OLD
    FROM CDPOS
    INTO TABLE lt_temp_cdpos
    FOR ALL ENTRIES IN lt_chunk_cdhdr
    WHERE OBJECTID = lt_chunk_cdhdr-objectid
    AND   CHANGENR = lt_chunk_cdhdr-CHANGENR.

Both queries have only 2 key fields in the where clause, but query #2 will be (much) slower. However, to get maximum performance, i would recommend to populate all 3 key fields - anyway, you already have them in the table lt_chunk_cdhdr. Reason: internal translation of for all entries into a native DB query and the implications of that. For example, if your object is of class KRED (vendor), with id '1234567890', and it has 50 change documents (that is, 50 entries in CDHDR), ABAP kernel may translate it into something like:

    select <list of fields> from cdpos
    where mandtant = <your current system client>
    and objectclas = 'KRED' and objectid = '1234567890'  --record #1
    union all
    select <list of fields> from cdpos
    where mandtant = <your current system client>
    and objectclas = 'KRED' and objectid = '1234567890'  --record #2
    union all
    ...
    union all
    select <list of fields> from cdpos
    where mandtant = <your current system client>
    and objectclas = 'KRED' and objectid = '1234567890'  --record #50

This behaviour was observed in MS SQL. Oracle, on the other hand, has OR instead of UNION ALL. The above shows the following:

  • data is selected in much smaller chunks than you have in the internal table which depend on the max SQL statement size and similar DB limitations;
  • if only generic fields are specified in the query, DB will have to return the same set of 50 records for each partial PK (well, that's our example with 50 change docs for a given object; of course, DB optimiser is quite smart and won't do that, but that's an extra load on the DB).

Bottom line: the most efficient way to select ... for all entries is to specify the full PK in the query:

    SELECT ...
    FROM CDPOS
    INTO TABLE lt_temp_cdpos
    FOR ALL ENTRIES IN lt_chunk_cdhdr
    WHERE OBJECTCLAS = lt_chunk_cdhdr-CHANGENR
    AND   OBJECTID = lt_chunk_cdhdr-objectid
    AND   CHANGENR = lt_chunk_cdhdr-CHANGENR.

Hope this helps someone.

Upvotes: 1

Sandra Rossi
Sandra Rossi

Reputation: 13656

In an ABAP system before ABAP 7.53 (*), a "cluster table" is an ABAP "thing" which is not known by the database system as a table: the data of a cluster table is compressed by the ABAP kernel and stored inside its "table cluster" which is a table in the database (**).

For instance, CDPOS is the cluster table, and CDCLS is its table cluster (see below screenshots). CDCLS also contains the cluster table PCDPOS (you can see all its cluster tables via the "where-used list" of the transaction code SE11).

Only the columns of the primary key of a table cluster are known by the database. The rest of columns of the cluster table are compressed into one big column of the table cluster, they cannot be queried directly by the database, they are known only by the ABAP kernel.

Your ABAP query on the cluster table will be transformed into a query on the table cluster and sent to the database (***) so, when you code your query, you must apply the same performance tips on the table cluster as for other classic tables ("Transparent tables" in SE11) i.e. you must filter the data via the key columns of the table cluster, eventually you may omit any of the last key columns but not the first ones otherwise it may be quite slow. The other columns are not relevant to improve the performance because they are not known by the database.

CDCLS has this primary key:

  • MANDT (not needed as it's added implicitly by the ABAP kernel)
  • OBJECTCLAS
  • OBJECTID
  • CHANGENR

Here are the screenshots of the transaction code SE11 which show if a table is a cluster table (CDPOS) and the information of its table cluster (CDCLS):




(*) table cluster - ABAP Glossary:

"Before Release 7.53, database table in the database that contains the data of multiple cluster tables. Cluster tables are no longer supported from Release 7.53 and all table clusters were removed."

(**) Here is how data is stored in a table cluster, copyright help.sap.com, extracted July 5th, 2024:

Data repartition of cluster tables into a table cluster

(***) You may use the transaction code ST05 to run an SQL trace and see how your ABAP SQL query is transformed into the one to the database table cluster.

Upvotes: 3

Lucas Ag
Lucas Ag

Reputation: 43

I managed to resolve the performance issue by adding the OBJECTCLAS field to the filter in addition to OBJECTID. This change drastically reduced the processing time.

After adding the OBJECTCLAS field, the select looked like this:

IF lt_chunk_cdhdr is not INITIAL.

    SELECT OBJECTID
           CHANGENR
           FNAME
           CHNGIND
           VALUE_NEW
           VALUE_OLD
      FROM CDPOS
      INTO TABLE lt_temp_cdpos
      FOR ALL ENTRIES IN lt_chunk_cdhdr
      WHERE OBJECTID = lt_chunk_cdhdr-objectid
      AND   OBJECTCLAS = lt_chunk_cdhdr-OBJECTCLAS.

    LOOP AT lt_temp_cdpos INTO ls_cdpos.
      CHECK ls_cdpos-fname = 'FRGKE' AND ls_cdpos-value_new = '2'
         OR ls_cdpos-fname = 'MWSKZ'
         OR ls_cdpos-fname = 'KEY' AND ls_cdpos-chngind = 'I'
         OR ls_cdpos-fname = 'BRTWR'
         OR ls_cdpos-fname = 'NETWR'
         OR ls_cdpos-fname = 'ZTERM'
         OR ls_cdpos-fname = 'INCO1'.

      APPEND ls_cdpos TO lt_cdpos.
    ENDLOOP.
ENDIF.

In my original approach, I was using only the OBJECTID field to filter the SELECT statement on the CDPOS cluster table. However, by including the OBJECTCLAS field as well, the database was able to optimize the query much more effectively, leading to significant performance improvements.

Before, when filtering 35 IDs, the program's processing time was around 1 minute and 15 seconds, but now with the change the time was reduced to 3 seconds.

With this, we can conclude that for optimal performance of a SELECT statement on a cluster table, it is necessary to filter using only key fields in the WHERE clause and to include as many key fields as possible. Additionally, it is not recommended to filter non-key fields in the SELECT statement; instead, use the CHECK command to filter these fields afterwards.

Upvotes: 2

Related Questions