Reputation: 283

LOOP itab vs VALUE FOR filtering, which is more efficient?

I am trying to write a for-loop statement for the following scenario:

I have used a select to get multiple tvarvc entries data into T_TVARVC.

LOOP AT t_tvarvc INTO DATA(s_tvarvc).
    CASE s_tvarvc-name.
      WHEN c_augru.
        s_tvarvc_range = CORRESPONDING #( s_tvarvc ).
        APPEND s_tvarvc_range TO t_augru.

      WHEN c_vkorg.
        s_tvarvc_range = CORRESPONDING #( s_tvarvc ).
        APPEND s_tvarvc_range TO t_vkorg.
    ENDCASE.
ENDLOOP.

This is what I have come up with:

DATA(t_augru) = VALUE tt_tvarvc( FOR s_tvarvc IN t_tvarvc
                                  WHERE ( name = c_augru )
                                  ( CORRESPONDING #( s_tvarvc ) ) ).

DATA(t_vkorg) = VALUE tt_tvarvc( FOR s_tvarvc IN t_tvarvc
                                  WHERE ( name = c_vkorg )
                                  ( CORRESPONDING #( s_tvarvc ) ) ).

My observation is that, by using LOOP AT and CASE statement combo, the number of iterations will be same as the number of entries in T_TVARVC.
But when using a FOR loop for each range table, T_TVARVC has to be traversed more times to reach the desired entry thus causing multiple iterations more than the first scenario.

Can this be written in a more efficient way?

Upvotes: 1

Answers (3)

Suncatcher

Reputation: 10621

You can also try specialized FILTER statement to achieve the same:

DATA lt_tadir TYPE SORTED TABLE OF tadir WITH NON-UNIQUE KEY object. 

SELECT * FROM tadir UP TO 1000000 ROWS INTO TABLE lt_tadir ORDER BY PRIMARY KEY. 

DATA(clas) = FILTER #( lt_tadir USING KEY primary_key WHERE object = 'CLAS' ). 
DATA(trans) = FILTER #( lt_tadir USING KEY primary_key WHERE object = 'TRAN' ).

I took the snippet prepared by @peterulb for 1M table and here are my measurements:

0.032947, 1000000, 128776, 0 <- FILTER
0.139579, 1000000, 128776, 0 <- standard
0.239092, 1000000, 128776, 0 <- sorted_secondary
0.242161, 1000000, 128776, 0 <- sorted_secondary_val

Despite FILTER uses inline declarations and transfers all source fields into the result table (it is by design), it is executed magnitude faster than the other variants.

I do not insist it will be the rule of thumb on all databases, but at least ABAP 7.54 on HANADB utilizes some built-in optimizations for FILTER statement.

Upvotes: 1

peterulb

Reputation: 2998

Just to add another answer relating to the part

My observation is that, by using LOOP AT and CASE statement combo, the number of iterations will be same as the number of entries in T_TVARVC.
But when using a FOR loop for each range table, T_TVARVC has to be traversed more times to reach the desired entry thus causing multiple iterations more than the first scenario.

This is only true when you've no sorted index for the field in question. Assume instead of WHERE ( name = c_vkorg ) you use USING KEY sk_name WHERE ( object = c_vkorg ). This will know the index where the values you are searching for are starting in log n time. It will then only process the lines matching the key, never looping over anything else.
This can potentially save a huge amount of time.

Index	Val1	Val2	Val3 (sorted index)
1	A	9999	AAA
2	B	1213	AAB
3	C	554	AAC
...	...	...	...
500	X	1	AUGUR	<-- Starting here with loop

The downside is that a sorted secondary key will also take time to build (and some memory). This can be less of an issue if you have other code also requiring fast access.
The secondary keys are lazy, so the first time they are used is the time they'll be created.

In your scenario, you have to decide what's worth it. Are there frequent read accesses requiring the key? How many rows? Is it more expensive to build up the key access because the key isn't needed elsewhere? How often will the secondary key be invalidated, etc etc.

(Note: If you uncomment the xsdbool, you exclude the time it takes to build the secondary key from the measurement).

REPORT ztest.

START-OF-SELECTION.
  PERFORM standard.
  PERFORM sorted_secondary.
  PERFORM sorted_secondary_val.

FORM standard.
  DATA t_tadir TYPE STANDARD TABLE OF tadir WITH EMPTY KEY.
  DATA t_clas TYPE STANDARD TABLE OF tadir-obj_name WITH EMPTY KEY.
  DATA t_tran TYPE STANDARD TABLE OF tadir-obj_name WITH EMPTY KEY.

  SELECT * FROM tadir UP TO 1000000 ROWS INTO TABLE @t_tadir ORDER BY PRIMARY KEY.
*  DATA(dummy) = xsdbool( line_exists( t_tadir[ key primary_key object = 'CLAS' ] ) ).

  GET RUN TIME FIELD DATA(t1).
  LOOP AT t_tadir ASSIGNING FIELD-SYMBOL(<s_tadir>).
    CASE <s_tadir>-object.
      WHEN 'CLAS'.
        APPEND <s_tadir>-obj_name TO t_clas.

      WHEN 'TRAN'.
        APPEND <s_tadir>-obj_name TO t_tran.
    ENDCASE.
  ENDLOOP.
  GET RUN TIME FIELD DATA(t2).
  WRITE: |{ ( t2 - t1 ) / '1000.0' / '1000.0' }, { lines( t_tadir ) }, { lines( t_clas ) }, { lines( t_tran ) }|.
  NEW-LINE.
ENDFORM.

FORM sorted_secondary.
  DATA t_tadir TYPE STANDARD TABLE OF tadir WITH NON-UNIQUE SORTED KEY sk_object COMPONENTS object.
  DATA t_clas TYPE STANDARD TABLE OF tadir-obj_name WITH EMPTY KEY.
  DATA t_tran TYPE STANDARD TABLE OF tadir-obj_name WITH EMPTY KEY.

  SELECT * FROM tadir UP TO 1000000 ROWS INTO TABLE @t_tadir ORDER BY PRIMARY KEY.
*  DATA(dummy) = xsdbool( line_exists( t_tadir[ key sk_object object = 'CLAS' ] ) ).

  GET RUN TIME FIELD DATA(t1).
  LOOP AT t_tadir ASSIGNING FIELD-SYMBOL(<s_tadir>) USING KEY sk_object WHERE object = 'CLAS'.
    APPEND <s_tadir>-obj_name TO t_clas.
  ENDLOOP.
  LOOP AT t_tadir ASSIGNING <s_tadir> USING KEY sk_object WHERE object = 'TRAN'.
    APPEND <s_tadir>-obj_name TO t_tran.
  ENDLOOP.
  GET RUN TIME FIELD DATA(t2).
  WRITE: |{ ( t2 - t1 ) / '1000.0' / '1000.0' }, { lines( t_tadir ) }, { lines( t_clas ) }, { lines( t_tran ) }|.
  NEW-LINE.
ENDFORM.

FORM sorted_secondary_val.
  DATA t_tadir TYPE STANDARD TABLE OF tadir WITH NON-UNIQUE SORTED KEY sk_object COMPONENTS object.
  DATA t_clas TYPE STANDARD TABLE OF tadir-obj_name WITH EMPTY KEY.
  DATA t_tran TYPE STANDARD TABLE OF tadir-obj_name WITH EMPTY KEY.

  SELECT * FROM tadir UP TO 1000000 ROWS INTO TABLE @t_tadir ORDER BY PRIMARY KEY.
*  DATA(dummy) = xsdbool( line_exists( t_tadir[ key sk_object object = 'CLAS' ] ) ).

  GET RUN TIME FIELD DATA(t1).
  t_clas = VALUE #( for <fs> in t_tadir USING KEY sk_object WHERE ( object = 'CLAS' ) ( <fs>-obj_name ) ).
  t_tran = VALUE #( for <fs> in t_tadir USING KEY sk_object WHERE ( object = 'TRAN' ) ( <fs>-obj_name ) ).
  GET RUN TIME FIELD DATA(t2).
  WRITE: |{ ( t2 - t1 ) / '1000.0' / '1000.0' }, { lines( t_tadir ) }, { lines( t_clas ) }, { lines( t_tran ) }|.
  NEW-LINE.
ENDFORM.

Also: LOOP AT ... ASSIGNING/REFERENCE INTO is likely to be faster than LOOP AT ... INTO. Since you make no write access that shouldn't be reflected in the original data source, there's no reason to copy every line in every loop step.

Upvotes: 3

Sandra Rossi

Reputation: 13646

I agree with your observation about doubling the iterations, and to make it faster, I think the only solution is to use only one loop, considering that the internal table is not already sorted, which limits the possible solutions a lot, and I come to this solution:

TYPES: tt_tvarvc TYPE STANDARD TABLE OF tvarvc WITH EMPTY KEY,
       BEGIN OF ty_ranges,
         t_augru TYPE tt_tvarvc,
         t_vkorg TYPE tt_tvarvc,
       END OF ty_ranges.
CONSTANTS: c_augru TYPE tvarvc-name VALUE 'AUGRU',
           c_vkorg TYPE tvarvc-name VALUE 'VKORG'.

DATA(t_tvarvc) = VALUE tt_tvarvc( for i = 1 while i <= 100 ( name = c_augru ) 
                                                           ( name = c_vkorg ) ).
DATA(ranges) = REDUCE ty_ranges(
                  INIT ranges2 = VALUE ty_ranges( )
                  FOR <tvarv> IN t_tvarvC
                  NEXT ranges2-t_augru = COND #( WHEN <tvarv>-name = c_augru
                                              THEN VALUE #( BASE ranges2-t_augru ( <tvarv> ) )
                                              ELSE ranges2-t_augru )
                       ranges2-t_vkorg = COND #( WHEN <tvarv>-name = c_vkorg
                                              THEN VALUE #( BASE ranges2-t_vkorg ( <tvarv> ) )
                                              ELSE ranges2-t_vkorg ) ).

(you will use ranges-t_augru and ranges-t_vkorg instead of t_augru and t_vkorg in your code)

You can immediately see that the code is much less legible than any of your two snippets.

Moreover, there is no gain in performance compared to your classic loop.

Back to your snippet with two FOR iterations, we can see that the goal is very clear compared to the classic loop (my opinion). It's of course slower, but probably you don't need to gain a few microseconds, and so I think it's the best solution (still my opinion).

Upvotes: 3

LOOP itab vs VALUE FOR filtering, which is more efficient?

Answers (3)

Related Questions