er3000
er3000

Reputation: 13

Substring a URL to the 5th slash

Could you help me sub-string a list of different URLs?

I can only understand how to sub-string a part of it, but could not manage to make it as requested - which is I need the URL to be full to the 5th slash (bold section), but some of the URLs don't have the 5th slash.

Example URLs:

'http://db-hit-internet/bags/personnel/default.axxxx'
'http://db-hit-internet/store/books/preview/default.axxxx'
'http://db-git-internet/friends/default.aspx?lang=LTT'

Expected output:

'http://db-hit-internet/bags/personnel'
'http://db-hit-internet/store/books/preview'
'http://db-git-internet/friends/default.aspx?lang=LTT'

I have query:

SELECT ('CS' + cast([id] as char (4))) AS name, [SysName], [Link], COUNT(*) AS Viewed  
  FROM main AS A
    INNER JOIN 
    (
    SELECT [LogDate], [LogPage] COLLATE Latin1_General_CI_AS PageName
    FROM web

    UNION
    SELECT [LogDate], [LogPage] PageName
    FROM web2

    )  
    ON A.Link= PageName
WHERE A.[Link] is not null 
GROUP BY A.id, A.[SysName], A.[Link] 

And I need the web and web2 union to have the URLs sub-stringed to the 5th slash. The problem is that there should be a CASE statement to check if the 5th slash exists and then according to that SUBSTRING and CHARINDEX should be included somewhere.

I tried:

LEFT([LogPage], CHARINDEX('/', [LogPage], CHARINDEX('/', [LogPage], CHARINDEX('/', [LogPage], CHARINDEX('//', [LogPage])+2)+1)+1))

But it only works with the URLs that have the 5th slash.

Upvotes: 1

Views: 255

Answers (2)

Zhorov
Zhorov

Reputation: 29993

One option is a JSON-based approach, which transforms the data into a valid JSON and parses this JSON with OPENJSON():

Table:

CREATE TABLE Data (url varchar(100))
INSERT INTO Data (url)
VALUES
   ('http://db-hit-internet/bags/personnel/default.axxxx'),
   ('http://db-hit-internet/store/books/preview/default.axxxx'),
   ('http://db-git-internet/friends/default.aspx?lang=LTT'),
   ('http://db-git-internet.net')

Statement:

SELECT CONCAT(j.part1, j.part2, j.part3, j.part4, j.part5) AS url
FROM Data d
CROSS APPLY OPENJSON(CONCAT('[["', REPLACE(STRING_ESCAPE(d.url, 'json'), '/', '/","'), '"]]')) WITH (
   part1 varchar(100) '$[0]',
   part2 varchar(100) '$[1]',
   part3 varchar(100) '$[2]',
   part4 varchar(100) '$[3]',
   part5 varchar(100) '$[4]'
) j

Result:

url                                                         url
http://db-hit-internet/bags/personnel/default.axxxx         http://db-hit-internet/bags/personnel/
http://db-hit-internet/store/books/preview/default.axxxx    http://db-hit-internet/store/books/
http://db-git-internet/friends/default.aspx?lang=LTT        http://db-git-internet/friends/default.aspx?lang=LTT
http://db-git-internet.net                                  http://db-git-internet.net

If you want a string-based approach, the following statement is a possible solution:

SELECT d.url, LEFT(d.url, v5.pos)
FROM Data d   
CROSS APPLY (SELECT CASE WHEN CHARINDEX('/', d.url, 1) = 0 THEN LEN(d.url) ELSE CHARINDEX('/', d.url, 1) END) v1 (pos)
CROSS APPLY (SELECT CASE WHEN CHARINDEX('/', d.url, v1.pos + 1) = 0 THEN LEN(d.url) ELSE CHARINDEX('/', d.url, v1.pos + 1) END) v2 (pos)
CROSS APPLY (SELECT CASE WHEN CHARINDEX('/', d.url, v2.pos + 1) = 0 THEN LEN(d.url) ELSE CHARINDEX('/', d.url, v2.pos + 1) END) v3 (pos)
CROSS APPLY (SELECT CASE WHEN CHARINDEX('/', d.url, v3.pos + 1) = 0 THEN LEN(d.url) ELSE CHARINDEX('/', d.url, v3.pos + 1) END) v4 (pos)
CROSS APPLY (SELECT CASE WHEN CHARINDEX('/', d.url, v4.pos + 1) = 0 THEN LEN(d.url) ELSE CHARINDEX('/', d.url, v4.pos + 1) END) v5 (pos)

Upvotes: 1

Arun Palanisamy
Arun Palanisamy

Reputation: 5469

You can also use CROSS APPLY.

SELECT URL,SUBSTRING(URL,1,CASE WHEN (TS.LOC!=0 AND FRS.LOC!=0 AND FVS.LOC!=0) THEN FVS.LOC 
                                ELSE LEN(URL) END) SUBURL 
FROM TEST
CROSS APPLY (VALUES(CHARINDEX('/',URL))) FS(LOC)
CROSS APPLY (VALUES(CHARINDEX('/',URL,FS.LOC+1))) SS(LOC)
CROSS APPLY (VALUES(CHARINDEX('/',URL,SS.LOC+1))) TS(LOC)
CROSS APPLY (VALUES(CHARINDEX('/',URL,TS.LOC+1))) FRS(LOC)
CROSS APPLY (VALUES(CHARINDEX('/',URL,FRS.LOC+1))) FVS(LOC)

Check Demo Here

Note : Please change the CASE conditions accordingly (Assumption: 2 slashes will always be there). This will give sub string till 5th slash.

Upvotes: 1

Related Questions