Lukerayner
Lukerayner

Reputation: 432

Adding Extra HASH partitions to already HASH partitioned table

Hi I currently have a table which has 100 HASH Partitions. I have decided that this now needs to be increased to 1000 partitions due to future scaling.

Do I need to remove the Partitions from the table and then add the 1000 partitions after or is there a way to add the extra 900 partitions to the already partitioned table?

The way I partitioned was using the below code.

ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 100;

Is there also a way to get an estimate on how long it will take to add 1000 partitions to my table? I will be using one of perconas tools to do this which will prevent the table from locking. https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html

Upvotes: 1

Views: 960

Answers (2)

Rick James
Rick James

Reputation: 142366

PARTITION BY HASH is virtually useless. I don't expect it to help you with 100 partitions, nor with 1000.

You get more bang for your buck by arranging to have venue_id as the first column in the PRIMARY KEY.

Does the query always have a single venue_id? (If not the options get messier.) For now, I will assume you always have WHERE venue_id = constant.

You have a multi-dimensional indexing problem. INDEXes are only one dimension, so things get tricky. However, partitioning can be used to sort of get a two-dimensional index.

Let's pick day_epoch as the partition key and use PARTITION BY RANGE(day_epoch). (If you change that from a 4-byte INT to a 3-byte DATE, then use PARTITION BY RANGE(TO_DAYS(day_epoch))).

Then let's decide on the PRIMARY KEY. Note: When adding or removing partitioning, the PK should be re-thought. Keep in mind that a PK is a unique index. And the data is clustered on the PK. (However, uniqueness is not guaranteed across partitions.)

So...

PARTITION BY RANGE(day_epoch)

PRIMARY KEY(venue_id, zone_id, id)  -- in this order

Without partitioning, I recommend

PRIMARY KEY(venue_id, zone_id, day_epoch, id)

In general, any index (including the PK) should start with any column(s) that are tested with =. Then IN, then at most one 'range'.

For the sake of the uniqueness requirement of the PK, I put the id last.

So, the query performs something like this:

  1. "Partition pruning" -- probably down to a single partition, based on the date.
  2. Drill down the PK directly to the consecutive rows for the one venue_id in question.
  3. Hopscotch across the data based on the zone_ids. (In some situations, this may be a range scan instead of the jumping around. This depends on the version, number of ids, values of the ids, and perhaps the phase of the moon.
  4. (If it makes it this far) Then get the desired date.

When fetching lots of rows from a huge table, the most important thing is to minimize disk hits. What I just described probably does the job better than other situations. Partitioning on venue_id helps only with that one column, but fails to help with the rest.

Upvotes: 0

Bill Karwin
Bill Karwin

Reputation: 562641

You don't need to remove partitioning to repartition. It's going to insert the rows to a new table anyway, so you might as well do this in one step.

Just ALTER TABLE and define the new partitioning scheme:

ALTER TABLE t1
PARTITION BY HASH(venue_id)
PARTITIONS 1000;

Or with pt-online-schema-change:

pt-online-schema-change h=myhost,D=mydatabase,t=t1 
  --alter "PARTITION BY HASH(venue_id) PARTITIONS 1000" 
  --execute

(I put line breaks in there to avoid line-wrapping, but that's one command.)


I forgot to comment on your other question, about predicting the ETA for completion.

One advantage of the Percona script is that it reports progress and you can get an estimate of the completion from that. Although in our environment, we find that it's not very accurate. It can sometimes report that it's 99% complete for hours.

Also keep in mind that the Percona script is not 100% without locking. It needs an exclusive metadata lock briefly at the start and end of its run, because it needs to create triggers and then rename the tables and drop the triggers at the end. Any query, even a read-only SELECT, will block the metadata lock. If you have trouble with the completion of the script, make sure any queries and transactions you run against your table finish quickly (or else you must kill them if not).

Upvotes: 1

Related Questions