Reputation: 4059
I'm working on a system where we have a number of internally generated reference codes. These codes can all be represented using a simple, one-bye-per-character set. Additionally the codes are always of a fixed length.
However, throughout the varying databases and tables, the codes are stored as a mix of varchar
and nvarchar
. How do I find the detrimental costs of this practice? I can see actual variations in query plans when it is necessary to join from varchar
to nvarchar
or vice versa, but don't know how to express this as a business cost.
My question has a number of parts:
The simplest example I've put together so far shows the difference in query plans, but of course, since it's so small, the query cost of each is negligible and overall runtime less than a second:
declare @Source1 table(Field1 char(4));
insert into @Source1 values ('1234'),('2345');
declare @Source2 table(Field1 varchar(4));
insert into @Source2 values ('1234'),('2345');
declare @Source3 table(Field1 nvarchar(4));
insert into @Source3 values (N'1234'),(N'2345');
declare @Consumer1 table(Field1 char(4), FilterField int);
insert into @Consumer1 values ('1234', 5);
declare @Consumer2 table(Field1 varchar(4), FilterField int);
insert into @Consumer2 values ('1234', 5);
declare @Consumer3 table(Field1 nvarchar(4), FilterField int);
insert into @Consumer3 values (N'1234', 5);
select * from @Consumer1 c inner join @Source1 s on c.Field1 = s.Field1 where c.FilterField = 5;
select * from @Consumer1 c inner join @Source2 s on c.Field1 = s.Field1 where c.FilterField = 5;
select * from @Consumer1 c inner join @Source3 s on c.Field1 = s.Field1 where c.FilterField = 5;
select * from @Consumer2 c inner join @Source1 s on c.Field1 = s.Field1 where c.FilterField = 5;
select * from @Consumer2 c inner join @Source2 s on c.Field1 = s.Field1 where c.FilterField = 5;
select * from @Consumer2 c inner join @Source3 s on c.Field1 = s.Field1 where c.FilterField = 5;
select * from @Consumer3 c inner join @Source1 s on c.Field1 = s.Field1 where c.FilterField = 5;
select * from @Consumer3 c inner join @Source2 s on c.Field1 = s.Field1 where c.FilterField = 5;
select * from @Consumer3 c inner join @Source3 s on c.Field1 = s.Field1 where c.FilterField = 5;
For additional info, we're using SQL Server 2014 in development and testing and 2008 in the live environment. I don't expect there to be any differences in this use case, but figured I'd ask if I need to consider something for that, as well?
Upvotes: 2
Views: 96
Reputation: 207
Implicit conversion is the probably the single biggest issue. There are many articles and blog posts discussing the negative performance effects of implicit data conversions. If the codes are always fixed length and fairly small (e.g. always 4 char) then Char or NChar would be better choices. The storage space can become an issue when the tables are large. A char(4) column occupies exactly 4 bytes in the record while a Varchar(4) requires the 4 bytes plus a couple for overhead. NChar or NVarchar basically double the space requirement. Codes of this sort are often part of index keys so the space can add up quickly. They are also frequently referenced in conditional clauses (Where, Order By, Case, etc.) so the Implicit Conversion can also become a major performance issue. Personally, I would investigate these and try to migrate to the Char datatype and in the root or parent tables and VERY consistent references in code elements. You should also look at other, similar situations. For example Char/Varchar to/from Integer is a very common problem.
Upvotes: 1
Reputation: 453648
nvarchar
will generally use twice the space as varchar
.
Exceptions are if you are using page or row compression (in which case you get Unicode compression thrown in) or double byte collations (in which case varchar can take two bytes per character too for some characters).
The main problem will be that nvarchar
has higher data type precedence than varchar
so if joining the two the varchar
side will need to be implicitly cast to nvarchar
preventing, or at least hindering, index usage on that side.
(It is collation dependant whether or not SQL Server can still manage a dynamic seek despite the implicit cast so it might not entirely rule out index usage for a nested loops join but I doubt that you could get a merge join on two indexes of the two different datatypes without an intermediate sort)
Upvotes: 4
Reputation: 2254
one of the more obvious effects is that different datatype prevent the use of foreign key constraints.
drop a few fundamental lines from a parent table while leaving all the child rows untouched and show the result to the boss.
you will get the answer 'nobody would do such a dumb delete' so be prepared.
Upvotes: 1