sheanineseven
sheanineseven

Reputation: 59

Is altering the Page Size in SQL Server the best option for handling "Wide" Tables?

I have a multiple tables in my application that are both very wide and very tall. The width comes from sometimes 10-20 columns with a variety of datatypes varchar/nvarchar as well as char/bigint/int/decimal. My understanding is that the default page size in SQL is 8k, but can be manually changed. Also, that varchar/nvarchar columns are except from this restriction and they are often(always?) moved to a separate location, a process called Row_Overflow. Evenso, MS documentation states that Row-Overflowed data will degrade performance. "querying and performing other select operations, such as sorts or joins on large records that contain row-overflow data slows processing time, because these records are processed synchronously instead of asynchronously"

They recommend moving large columns into joinable metadata tables. "This can then be queried in an asynchronous JOIN operation".

My question is, is it worth enlarging the page size to accomodate the wide columns, and are there other performance problems thatd come up? If I didnt do that and instead partitioned the table into 1 or more metadata tables, and the tables got "big" like in the 100MM records range, wouldnt joining the partitioned tables far outweigh the benefits? Also, if the SQL Server is on a single core machine (or on SQL Azure) my understanding is that parallelism is disabled, so would that also eliminate the benefit of moving the tables intro partitions given that the join would no longer be asynchronous? Any other strategies that you'd recommend?

EDIT: Per the great comments below and some additional reading (that I shouldve done originally), you cannot manually alter SQL Server page size. Also, related SO post: How do we change the page size of SQL Server?. Additional great answer there from @remus-rusanu

Upvotes: 1

Views: 3128

Answers (3)

Diego
Diego

Reputation: 36146

  • you can't change the default 8k page size
  • varchar and nvarchar are treated like any other field, unless the are (max) which means they will be stored a little bit different because they can extend the size of a page, which you cant do with another datatype, also because it is not possible

For example, if you try to execute this statement:

create table test_varchars(
  a varchar(8000),
  b varchar(8001),
  c nvarchar(4000),
  d nvarchar(4001)
)

Column a and c are fine because both on them are max 8000 bytes in length.

But, you would get the following errors on columns b and d:

The size (8001) given to the column 'b' exceeds the maximum allowed for any data type (8000).
The size (4001) given to the parameter 'd' exceeds the maximum allowed (4000).

because both of them exceed the 8000 bytes limit. (Remember that the n in front of varchar or char means unicode and occupies double of space)

Upvotes: 1

Remus Rusanu
Remus Rusanu

Reputation: 294307

My understanding is that the default page size in SQL is 8k, but can be manually changed

The 'large pages' settings refers to memory allocations, not to change the database page size. See SQL Server and Large Pages Explained. I'm afraid your understanding is a little off.

As a general non-specific advice, for wide fixed length columns the best strategy is to deploy row-compression. For nvarchar, Unicode compression can help a lot. For specific advice, you need to measure. What is the exact performance problem you encountered? How did you measured? Did you used a methodology like Waits and Queues to identify the bottlenecks and you are positive that row size and off-row storage is an issue? It seems to me that you used the other 'methodology'...

Upvotes: 3

Mark S. Rasmussen
Mark S. Rasmussen

Reputation: 35476

You cannot change the page size.

varchar(x) and (MAX) are moved off-row when necessary - that is, there isn't enough space on the page itself. If you have lots of large values it may indeed be more effective to move them into other tables and then join them onto the base table - especially if you're not always querying for that data.

There is no concept of synchronously and asynchronously reading that off-row data. When you execute a query, it's run synchronously. You may have parallelization but that's a completely different thing, and it's not affected in this case.

Edit: To give you more practical advice you'll need to show us your schema and some realistic data characteristics.

Upvotes: 6

Related Questions