Thắng Nguyễn
Thắng Nguyễn

Reputation: 13

Why is a non-partitioned table with composite indexing performing better than a partitioned table?

I have a table A with around 300 million rows, where 2 million rows are inserted daily. The table A is partitioned by the MONTH_ID column, with each partition corresponding to a month, and has an index on the DATE column (I created an index within each partition).

CREATE TABLE Table_A (
    "TXN_REFR_NBR" VARCHAR2(100),
    "TXN_REFR_NBR_HANDLE" VARCHAR2(100),
    "BRANCH_CODE" VARCHAR2(50),
    "GRAND_BRANCH_LV2" VARCHAR2(50),
    "GRAND_BRANCH_LV1" VARCHAR2(50),
    "BRANCH_QLBC" VARCHAR2(50),
    "CCY" VARCHAR2(15),
    "TXN_DATE" DATE,
    "VALUE_DATE" DATE,
    "GL_CODE" VARCHAR2(100),
    "GL_NAME" VARCHAR2(100),
    "GL_CODE_NBR_HANDLE" VARCHAR2(100),
    "GL_NAME_HANDLE" VARCHAR2(200),
    "PL_CATEGORY" VARCHAR2(100),
    "DB_FCYAMOUNT" NUMBER,
    "DB_LCYAMOUNT" NUMBER,
    "CR_FCYAMOUNT" NUMBER,
    "CR_LCYAMOUNT" NUMBER,
    "NET_FCY" NUMBER,
    "NET_LCY" NUMBER,
    "DSC" VARCHAR2(4000),
    "EMPE_CODE" VARCHAR2(100),
    "CIF" VARCHAR2(100),
    "LINE_SYSTEM_GL78" VARCHAR2(50),
    "LINE" VARCHAR2(50),
    "CUST_TYPE" VARCHAR2(20),
    "CUST_DIVISION" VARCHAR2(20),
    "CATEGORY" VARCHAR2(50),
    "PRODUCT" VARCHAR2(50),
    "TENOR" NUMBER,
    "CHANNEL" VARCHAR2(50),
    "SOURCE" VARCHAR2(50),
    "LV1_ID" VARCHAR2(20),
    "LV2_ID" VARCHAR2(20),
    "LV3_ID" VARCHAR2(20),
    "LV4_ID" VARCHAR2(20),
    "LV1_NAME" VARCHAR2(200),
    "LV2_NAME" VARCHAR2(200),
    "LV3_NAME" VARCHAR2(200),
    "LV4_NAME" VARCHAR2(200),
    "ITEM_1" VARCHAR2(50),
    "ITEM_2" VARCHAR2(50),
    "ITEM_3" VARCHAR2(50),
    "PRODUCT_LV1" VARCHAR2(11),
    "PRODUCT_LV2" VARCHAR2(200),
    "PRODUCT_LV3" VARCHAR2(200),
    "MONTHID" NUMBER
) PARTITION BY LIST (MONTHID) (
    PARTITION MONTHID_01 VALUES (202401),
    PARTITION MONTHID_02 VALUES (202402),
    PARTITION MONTHID_03 VALUES (202403),
    PARTITION MONTHID_04 VALUES (202404),
    PARTITION MONTHID_05 VALUES (202405),
    PARTITION MONTHID_06 VALUES (202406),
    PARTITION MONTHID_07 VALUES (202407)
);
---Index the MONTHID and TXN_DATE for Table A
CREATE INDEX txn_date_idx ON Table_A (MONTHID, TXN_DATE) LOCAL (
    PARTITION MONTHID_01,
    PARTITION MONTHID_02,
    PARTITION MONTHID_03,
    PARTITION MONTHID_04,
    PARTITION MONTHID_05,
    PARTITION MONTHID_06,
    PARTITION MONTHID_07
);
---Create table B 
Insert into Table_B  (select * from Table_A);
Create index IDX_MONTHID_TXNDATE on Table_B(MONTHID,TXN_DATE);

Then, I created a duplicate of this table, called table B, without partitions. Instead, I created a composite index on the MONTH_ID and DATE columns for B.

After comparing the execution plans of queries on both tables (the query i use to compare below), I found that queries on the non-partitioned table B are faster than on the partitioned table A. This was unexpected, as I assumed that partitioning a large table (along with indexing) would improve performance over a non-partitioned table. Table B Table A

SELECT GL_CODE, sum(NET_LCY) FROM Table_A
WHERE MONTHID = 202406 AND TXN_DATE = to_date('2024-06-10', 'YYYY-MM-DD')
GROUP BY GL_CODE

Operation Object Optimizer Cost Cardinality Bytes
SELECT STATEMENT ALL_ROWS 144,642 261 7,569
HASH GROUP BY ALL_ROWS 144,642 261 7,569
PARTITION LIST (SINGLE) ANALYZED 144,588 2,101,708 60,949,532
TABLE ACCESS (BY LOCAL INDEX ROWID BATCHED) Table_A ANALYZED 144,588 2,101,708 60,949,532
INDEX RANGE SCAN TXN_DATE_IDX ANALYZED 5,591 2,101,899 0
SELECT GL_CODE, sum(NET_LCY) FROM Table_B
WHERE MONTHID = 202406 AND TXN_DATE = to_date('2024-06-10', 'YYYY-MM-DD')
GROUP BY GL_CODE
Operation Object Optimizer Cost Cardinality Bytes
SELECT STATEMENT ALL_ROWS 59,961 330 9,570
HASH GROUP BY ANALYZED 59,961 330 9,570
TABLE ACCESS (BY INDEX ROWID BATCHED) Table_B ANALYZED 59,946 603,088 17,489,552
INDEX RANGE SCAN Table_B MONTHID IDX ANALYZED 3,123 603,088 0

Could someone explain why the non-partitioned table B performs better than the partitioned table A? Are there specific cases where partitioning might not provide the expected performance benefits, or is there something about my indexing or partitioning strategy that could be optimized?

Thank you for any insights!

Anyones suggest me that i have to index or partition with large table like this?

Upvotes: 1

Views: 85

Answers (1)

Paul W
Paul W

Reputation: 11324

Partitioning does not improve query speed unless (1) your query prunes partitions by filtering on the partition key, and (2) your query is doing a full table scan to access the table. This is typical of a datawarehouse environment where queries crunch huge amounts of data. But in your example, you are using indexes in both scenarios, partitioning won't help performance.

In fact, it can actually hurt it. If you define an index on a partitioned table as LOCAL there are in fact many index segments, all separate b-tree structures. There is no overarching entry-point to these. So, if your query fails to prune partitions to a single partition, more than one index b-tree segment must be accessed and scanned. That means more work to find those rows. A global index on the other hand is a single index segment covering the whole table, so an index scan on it requires only one scan. So I would say it is quite typical for non-partitioned tables or partitioned tables with a global index to out-perform a partitioned table with local indexes if you are relying on those indexes. This is particularly acute for high cardinality indexes that support single-row lookups. For an index on a low cardinality column like TXN_DATE, the overhead of multiple scans is almost unnoticeable next to the amount of time for pulling table blocks.

Now, your precise example is a bit odd: you are in fact pruning partitions when you say MONTHID = 202406, so the locality of the index shouldn't be hurting you. I would say your two test queries should perform equally. When comparing, did you ensure that caching is not messing up your measurements? Are you sure they had the same data? Are you sure they were on the same tablespace? Did you check the wait interface to see if the wait profile was different? Did one have parallelism enabled and the other not? And I am assuming you actually ran these queries, and aren't simply relying on the costing from the execution plan (which is very frequently incorrect). Lots of things to check. But all other things being equal, your two test queries should perform identically.

My last comment is that with this partitioned table you are probably better off without any index on either the monthid or the txn_date field. Your queries should be pruning on monthid so that already narrows it down to one month - using an index to narrow that down to 1/30th is not as efficient as simply doing a full scan of the partition, especially if you enable parallel query. And definitely don't index monthid - if that's the partitioning key, it serves absolutely no purpose being indexed.

Upvotes: 1

Related Questions