Reputation: 1062945
On SQL Server configured as case-insensitive, group by
can have interesting results when the [n][var]char
column is not the first group by
column. Essentially, it looks like whatever row it encounters "first" (where "first" is undefined in the absence of an order): wins for that grouping. For example:
select x.[day], x.[name], count(1) as [count]
from (
select 1 as [day], 'a' as [name]
union all select 1, 'A'
union all select 2, 'A'
union all select 2, 'a'
) x group by x.[day], x.[name]
which returns, for me:
day name count
----------- ---- -----------
1 A 2
2 a 2
Using min(x.[name])
has no effect, since the grouping already happened.
I can't add an order by
before the group by
, as that is illegal; and adding the order by
after the group by
just defines the output order after the grouping - it still gives a
and A
.
So: is there a sane way of doing this where the capitalization will at least be consistent for all the groupings? (I'll leave for another day the problem of being consistent for separate runs)
Desired output, either:
day name count
----------- ---- -----------
1 A 2
2 A 2
or:
day name count
----------- ---- -----------
1 a 2
2 a 2
Edit: without destroying capitalisation when it is consistent between groups. So no upper/lower. So if one of the groups consistently has the value BcDeF
, I want the result of that row to be BcDeF
, not bcdef
or BCDEF
.
Upvotes: 5
Views: 653
Reputation: 44776
Use a case insensitive collation in the Group by
, e.g:
select day, name, count(*)
from tablename
group by day, name collate SQL_Latin1_General_Cp1_CI_AS_KI_WI
Perhaps SQL Server has problems here? Using another dbms it executes as:
SQL>create table t (d int, name varchar(10));
SQL>insert into t values (1,'A');
SQL>insert into t values (2,'A');
SQL>insert into t values (2,'a');
SQL>insert into t values (3,'BcDeF');
SQL>insert into t values (3,'bCdEf');
SQL>insert into t values (4,'a');
SQL>select d, name, count(*)
SQL&from t
SQL&group by d, name collate english_1;
d name
=========== ========== ====================
1 A 1
2 A 2
3 BcDeF 2
4 a 1
4 rows found
Where english_1 is a case insensitive collation.
As expected?
Upvotes: 1
Reputation: 70648
I would use windowing functions for this. By using ROW_NUMBER
and partitioning using a case insensitive collation, but ordering by a case sensitive one, we will choose consistently one result with the original capitalisation, but it will group them as if they are the same:
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY [day], [name]
ORDER BY [name] COLLATE SQL_Latin1_General_Cp1_Cs_AS),
N = COUNT(*) OVER(PARTITION BY [day], [name])
FROM ( select 1 as [day], 'a' as [name]
union all select 1, 'A'
union all select 2, 'A'
union all select 2, 'a'
union all select 3, 'BcDeF'
union all select 3, 'bCdEf') X
)
SELECT *
FROM CTE
WHERE RN = 1;
It returns:
╔═════╦═══════╦════╦═══╗
║ day ║ name ║ RN ║ N ║
╠═════╬═══════╬════╬═══╣
║ 1 ║ A ║ 1 ║ 2 ║
║ 2 ║ A ║ 1 ║ 2 ║
║ 3 ║ BcDeF ║ 1 ║ 2 ║
╚═════╩═══════╩════╩═══╝
Following @AndriyM's comment, if you want the same capitalisation over the whole result set, and not just the same day, you can use:
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY [day], [name]
ORDER BY [name] COLLATE SQL_Latin1_General_Cp1_Cs_AS),
N = COUNT(*) OVER(PARTITION BY [day], [name])
FROM ( select 1 as [day], 'a' as [name]
union all select 1, 'A'
union all select 2, 'A'
union all select 2, 'a'
union all select 3, 'BcDeF'
union all select 3, 'bCdEf') X
)
SELECT [day],
MAX([name] COLLATE SQL_Latin1_General_Cp1_CS_AS) OVER (PARTITION BY [name]) [name],
N
FROM CTE
WHERE RN = 1;
Upvotes: 8
Reputation: 4053
You can use UPPER
in GROUP BY
clause to transfer all the values to same capitalization.
Upvotes: 0
Reputation: 1269953
Use upper()
or lower()
:
select x.[day], lower(x.[name]) as name, count(1) as [count]
from (
select 1 as [day], 'a' as [name]
union all select 1, 'A'
union all select 2, 'A'
union all select 2, 'a'
) x
group by x.[day], x.[name];
You are correct that SQL Server chooses a value from an indeterminate row. min()
and max()
don't help, because the values are equivalent. The simplest solution is to explicitly choose the case that you want.
Upvotes: 1