user1246770
user1246770

Reputation: 1

RegEx pattern to extract URLs

I have to extract all there is between this caracters:

<a href="/url?q=(text to extract whatever it is)&amp

I tried this pattern, but it's not working for me:

/(?<=url\?q=).*?(?=&amp)/

I'm programming in Vb.net, this is the code, but I think that the problem is that the pattern is wrong:

    Dim matches As MatchCollection

    matches = regex.Matches(TextBox1.Text)

    For Each Match As Match In matches

        listbox1.items.add(Match.Value)

    Next

Could you help me please?

Upvotes: 0

Views: 2867

Answers (2)

Stefan Đorđević
Stefan Đorđević

Reputation: 595

This regex code below will extract all urls from your text (or any other):

(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?

Upvotes: 0

Oleks
Oleks

Reputation: 32323

Your regex is seemed to be correct except the slash(/) in the beginning and ending of expression, remove it:

Dim regex = New Regex("(?<=url\?q=).*?(?=&amp)")

and it should work.

Some utilities and most languages use / (forward slash) to start and end (de-limit or contain) the search expression others may use single quotes. With System.Text.RegularExpressions.Regex you don't need it.

Upvotes: 2

Related Questions