Katia
Katia

Reputation: 729

table partitions , which is better?

I have TimeData table which has to be partitioned into quarters for each year from 2009 till 2013 and none for the next years

I made 2 approaches:

1

CREATE TABLE TimeData (
id      NUMBER PRIMARY KEY NOT NULL,
day_name varchar(45),
day     NUMBER(2),
month   NUMBER(2),
quart   NUMBER(1),
year    NUMBER(4)
)

 PARTITION BY LIST (year)
 SUBPARTITION BY LIST (quart)
 (
  SUBPARTITION q1 values (1),
  SUBPARTITION q2 values (2),
  SUBPARTITION q3 values (3),
  SUBPARTITION q2 values (4)
)
  (
SUBPARTITION y_09 VALUES (2009),  
SUBPARTITION y_10 VALUES (2010),  
SUBPARTITION y_11 VALUES (2011),  
SUBPARTITION y_12 VALUES (2012),  
SUBPARTITION other VALUES (DEFAULT),
 ) ;

2

CREATE TABLE TimeData (
id      NUMBER PRIMARY KEY NOT NULL,
day_name varchar(45),
day     NUMBER(2),
month   NUMBER(2),
year    NUMBER(4)
)

PARTITION BY LIST (year)
SUBPARTITION BY RANGE (month)
(
  SUBPARTITION q1 values less than(4),
  SUBPARTITION q2 values less than(7),
  SUBPARTITION q3 values less than(10),
  SUBPARTITION q2 values less than(13)
)
(
SUBPARTITION y_09 VALUES (2009),
SUBPARTITION y_10 VALUES (2010),
SUBPARTITION y_11 VALUES (2011),
SUBPARTITION y_12 VALUES (2012),
SUBPARTITION other VALUES (DEFAULT),
 );

both approaches keep partitioning after 2012 , I couldn't figure out how overcome this
but the question is , does having the 'quart' field makes it better ? less calculations maybe
or without it , less storage ?!

** update
a 3rd approach just popped into my head , is to have 16 partition (4 quarters * 4 years) and the 17th partition is values less than (maxvalue) .. this way I can over come the forever partitioning thing , right?

Upvotes: 2

Views: 322

Answers (2)

Jon Heller
Jon Heller

Reputation: 36807

@Glen has a lot of good ideas, but if you're open to a more significant redesign you may want to consider using a single date column and interval partitioning.

create table TimeData
(
    id number primary key,
    the_date date
)
partition by range(the_date)
interval (numToYMInterval(3, 'month'))
(
    partition first_partition values less than (date '0001-01-01')
);

Using a single date column instead of multiple number and varchar columns has several significant advantages:

  • Uses significantly less storage
  • Removes many potential data issues
  • Provides much more useful information to the optimizer
  • Simplifies many queries (you'll need to be familiar with the Oracle date functions and formats, but you won't need to rebuild the date)

Interval paritioning can significantly improve maneagability; you'll never need to worry about pre-creating new partitions.

Upvotes: 0

Glenn
Glenn

Reputation: 9150

It's been a while since I worked with partitioning, so take this with a grain of salt...

If it really is the case that you are dealing with 16 fixed partitions, nothing thereafter, and you only want those 16 partitions and never more, then could simply use range partitions where the first quarter extends to the beginning of time and the last quarter to the end of time (replace date with your own breakdown):

PARTITION BY RANGE (date)
  (PARTITION p2009_q1 VALUES LESS THAN (TO_DATE('2009-04-01', 'YYYY-MM-DD')),
   PARTITION p2009_q2 VALUES LESS THAN (TO_DATE('2009-07-01', 'YYYY-MM-DD')),
   PARTITION p2009_q3 VALUES LESS THAN (TO_DATE('2009-10-01', 'YYYY-MM-DD')),
   PARTITION p2009_q4 VALUES LESS THAN (TO_DATE('2010-01-01', 'YYYY-MM-DD')),
   PARTITION p2010_q1 VALUES LESS THAN (TO_DATE('2010-04-01', 'YYYY-MM-DD')),
   ...
   PARTITION p2013_q3 VALUES LESS THAN (TO_DATE('2014-09-01', 'YYYY-MM-DD')),
   PARTITION p2013_q4 VALUES LESS THAN  MAXVALUE) 

Or you could just hash into 16 buckets.

Now for the aside. Questions that immediately pop into mind:

  • why does it need to be partitioned quarterly?
  • why does it need to be subpartitioned?
  • why only till 2013? (what will happen after that?)
  • after 2013, what will happen to the old data/partitions?
  • after initial loading, will new records only being added to the "current date" partition?
  • what kind of data volumes are we expecting per partition?

Partitioning is a physical attribute which will be driven by data usage. From my perspective, partition granulatiry is typically driven by data size and archiving requirements. For example, if capturing a million rows of log data a day, I might range partition by day, regularly pre-creating partitions for upcoming days and changing old days to read only. The data might only be useful for a week after which the oldest partition can be dropped or archived. Then we have a moving window of partitions. But if I'm only getting 10,000 records a week, I'd just create a rolling window of weekly partitions. Not that I really need a partition for only one week of data, but because it gives me an easy way to drop off/archive data older than a week (via the partition) as per the data retention requirements. The end user may view data by day or by hour or whatever.

So if the data is being viewed quarterly, it doesn't mean it can't be partitioned monthly if that makes sense. Try to pick a scheme that will allow you to more easily add partitions later if you can anticipate that requirement coming. For example, with range partitioning, you can start splitting the top partition when they ask for it in a year or two.

Oh, and by the way, if you name your partitions in a nice sortable manner (YYYY MM DD...), it becomes pretty easy to write a script that does a bit of dynamic sql to "alter table add partition" and look after partition creation (if that has not already been added as a feature). Whith that in mind, the first and last partitions above should be named a bit differently.

Upvotes: 3

Related Questions