Reputation: 77
I am trying to group by the column with same exact word or same first two words or same first three words
tblsample:
name
Brown sugar
Brown sugar Creme
Tiger zoo
Tiger
Blue red green
Blue red green yellow
Expected result:
name cntnme
Brown sugar 2
Tiger zoo 1
Tiger 1
Blue red green 2
I only can group by exact same words.
Select name, count(name) as cntnme from tblsample group by name
Upvotes: 4
Views: 478
Reputation: 5932
The following is an example of how to do this
create table dbo.t(x varchar(30))
insert into dbo.t values('Brown sugar')
insert into dbo.t values('Brown sugar Creme')
insert into dbo.t values('Tiger zoo')
insert into dbo.t values('Tiger')
insert into dbo.t values('Blue red green')
insert into dbo.t values('Blue red green yellow')
insert into dbo.t values('Blue red green yellow green')
--First i identify the records that have more than one word using space as a delimiter and then i look them up with words that match using a like condition.
with data
as (select x, x as parent_word
from dbo.t
where charindex(' ',x) <> 0
union all
select b.x,a.parent_word
from data a
join dbo.t b
on a.x <> b.x
and b.x like concat(a.x,'%')
)
--In this block i find out the parent_word associated wit each of the words, and use min to identify them out. In this example (Blue,red green yellow),(Blue,red,green,yellow,green) and (Blue,red,gree) would all be segregated by the lowest value of parent which is (Blue,red,green)
,data_parent_word
as (
select x,min(parent_word) as min_value
from data
group by x
)
select min_value as word_val, count(x) as cnt
from data_parent_word
group by min_value
union all
select x,1
from dbo.t
where charindex(' ',x) = 0
Finally i do a union all to get the records of what i obtained with the list of "single word"
This gives the output as follows
+----------------+----------+
| word_val | cnt |
+----------------+----------+
| Blue red green | 3 |
| Brown sugar | 2 |
| Tiger zoo | 1 |
| Tiger | 1 |
+----------------+----------+
Upvotes: 1
Reputation: 1270713
I am trying to group by the column with same exact word or same first two words or same first three words.
This suggests an approach like this:
with words2_3 as (
select t.*
from t
where name like '% %' and name not like '% % % %'
)
select w.name, count(*)
from t join
words2_3 w
on t.name + ' ' like w.name + ' %'
group by w.name;
Here is a db<>fiddle.
Your results are inconsistent with your problem statement, so these results do not agree with your statement.
Upvotes: 1
Reputation: 8829
Sounds like you want to use the SOUNDEX (Transact-SQL) operator, e.g.:
create table dbo.Names (
Name nvarchar(50) not null
);
insert dbo.Names values
('Brown sugar'),
('Brown sugar Creme'),
('Tiger zoo'),
('Tiger'),
('Blue red green'),
('Blue red green yellow'),
('Blue red green yellow orange');
select Name=min(Name), CntNme=count(1)
from dbo.Names
group by soundex(Name);
Name CntNme
--------------- ------
Blue red green 3
Brown sugar 2
Tiger 2
Upvotes: 3