Elio Fernandes
Elio Fernandes

Reputation: 1420

retrieve string between html tags using regex

How can I get the string between the html tags <p> and </p> using regular expressions?

'<div class="ExternalClass849E0BFE74914F8BB79B2C64E3D07AE6"><p>​Just help me!</p></div>'

Regards, Elio Fernandes

Upvotes: 1

Views: 1813

Answers (5)

Mikael Eriksson
Mikael Eriksson

Reputation: 138960

If your data can tolerate the transfer to XML you can do like this:

declare @s varchar(1000) = '<div class="ExternalClass849E0BFE74914F8BB79B2C64E3D07AE6"><p>​Just help me!</p></div>';

select cast(@s as xml).value('(//p/text())[1]', 'varchar(1000)');

Upvotes: 5

Ste Bov
Ste Bov

Reputation: 856

its hacky as sin

But if you want to be able to find potentially different entries across a web page not just a single

where @search is the name of the div and @toSearch is the html content

 declare @search nvarchar(200) = 'ExternalClass849E0BFE74914F8BB79B2C64E3D07AE6'

 declare @toSearch nvarchar(4000)
 = 'Lorem ipsum dolor sit amet.<div class="ExternalClass849E0BFE74914F8BB79B2C64E3D07AE6"><p>Just help me!</p></div>Lorem ipsum dolor sit amet.'

 DECLARE @start int
    ,@end int


 select @start = CHARINDEX('<p>',@toSearch,CHARINDEX(@search,@toSearch)) +3
 select @end = CHARINDEX('</p>',@toSearch,CHARINDEX(@search,@toSearch))

 select SUBSTRING(@toSearch,@start,@end - @start)

Upvotes: 1

C Scupski
C Scupski

Reputation: 36

You should be able to use a combination of PATINDEX(find,search) and SUBSTRING(string,start,length) to extract the text between the p tags.

DECLARE @myString VARCHAR(max)
SET @myString='<div class="ExternalClass849E0BFE74914F8BB79B2C64E3D07AE6"><p>​Just help me!</p></div>'
SELECT SUBSTRING(@myString,PATINDEX('%<p>%',@myString)+3,PATINDEX('%</p>%',@myString)-PATINDEX('%<p>%',@myString)-3)

Will yield a result of 'Just Help me!'

Depending on what you are trying to use that text for though, it may be easier to extract it elsewhere.

Upvotes: 0

Elio Fernandes
Elio Fernandes

Reputation: 1420

I found an answer using CHARINDEX, but if there a way to do it with REGEX, you are welcome!

DECLARE @Obs  nvarchar(400)

SET @Obs = '<div class="ExternalClass849E0BFE74914F8BB79B2C64E3D07AE6"><p>​Just help me!</p></div>'

SELECT 
CASE WHEN CHARINDEX('<p>', [Obs])  <= 0 THEN ''
            ELSE SUBSTRING([Obs],CHARINDEX('<p>', [Observacoes]) + 3,CHARINDEX('</p>', [Obs]) - CHARINDEX('<p>', [Obs]) - 3) END AS OBS
FROM mytable

Upvotes: 1

This isn't something you generally want to be doing, but it can be achieved for simple strings.

DECLARE @s VARCHAR(1000) = '<div class="ExternalClass849E0BFE74914F8BB79B2C64E3D07AE6"><p>​Just help me!</p></div>';

SELECT SUBSTRING(@s, 
                 CHARINDEX('<p>', @s) + LEN('<p>'), 
                 CHARINDEX('</p>', @s, CHARINDEX('<p>', @s))
                 - LEN('</p>') - CHARINDEX('<p>', @s)
                 )

This code uses SUBSTRING and CHARINDEX to look for the position of the tags you're interested in.

First you locate <p>, then you locate the first instance of </p> after it. The use of LEN is to move the start and end points to not include the tags you're searching between.

Hope this helps!

Upvotes: 1

Related Questions