user3786581
user3786581

Reputation: 421

How to get specific word from the string column using SQL Server

I have a table.

create table tblProduct
(
    ProductID int primary key identity(1000,1),
    ProductName varchar(100),
    ProductDescription nvarchar(max)
)

In this table, there are 1000 records like this...

ProductID=1001  
ProductName='Apple i6'  
ProductDescription='Lorem Ipsum is simply dummy text of the printing and typesetting industry. **Product of USA** Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.'

ProductID=1002  
ProductName='Micromax Canvas'  
ProductDescription='Scrambled it to make a type specimen bookLorem Ipsum is simply dummy text of the printing and typesetting industry. **Product of INDIA** Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.'

ProductID=1003  
ProductName='Oppo Z3'  
ProductDescription='Lorem Ipsum is simply dummy text of the printing and typesetting industry. **Product of INDIA** Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.'  

and so on....

Now I want to find only country name (no duplicate and no other word) from the productDescription column of the tblProduct...

Output must be something like this:

Country Name - Total(Group By)  
USA - 1  
INDIA - 2  

Note: "product of XXX*" will be available almost all rows of productdescription column.

*xxx is the country name.

Upvotes: 0

Views: 7759

Answers (1)

Lukasz Szozda
Lukasz Szozda

Reputation: 175716

You can use string manipulation functions LEFT/RIGHT/PATINDEX:

WITH cte AS
(
  SELECT 
   r = RIGHT(ProductDescription, LEN(ProductDescription) -  
                               PATINDEX('%Product of%' ,ProductDescription) - 10)
  FROM #tblProduct
  WHERE PATINDEX('%Product of%' ,ProductDescription) > 0
)
SELECT country = LEFT(r, CHARINDEX(' ', r)-1), COUNT(*) AS Total
FROM cte
GROUP BY LEFT(r, CHARINDEX(' ', r)-1);

LiveDemo

But you have to think about corner cases:

  • country name contains multiple words like Russian Federation
  • what with multiple names (U.S./USA/United States of America...) you will get multiple groups, you need data cleansing

Note: "product of XXX*" will be available almost all rows of productdescription column.

If you know all countries in advance it will be much easier. Just create table countries:

name      master 
'U.S.'    'USA'
'USA'     'USA'
'India'   'India'

SELECT c.master, COUNT(*) AS total
FROM #tblProduct p
JOIN countries c
  ON p.Description LIKE '%Product of ' + c.name + '%'
GROUP BY c.master;

EDIT:

WITH cte AS
(
SELECT r = RIGHT(ProductDescription, LEN(ProductDescription) -  PATINDEX('%Product of <strong>%' ,ProductDescription) - 18)
FROM #tblProduct
WHERE PATINDEX('%Product of <strong>%' ,ProductDescription) > 0
)
SELECT country = LEFT(r, CHARINDEX('<', r)-1), COUNT(*) AS Total
FROM cte
GROUP BY r,LEFT(r, CHARINDEX('<', r)-1);

LiveDemo2

Upvotes: 2

Related Questions