Thalia
Thalia

Reputation: 14615

Extract pattern from string, with special characters, using Regular Expressions

I am trying to use a regex in VB.NET - the language probably shouldn't matter though - I am trying to extract something reasonable out of a very large file name, "\\path\path\path.path.path\path\some_more_stuff_from a name.item_123_456.html"

I would like to extract, from that whole mess, the "item_123_456"

It seems to make sense that I can get everything before a pattern like ".html" , and from it, everything after the last dot ?

I have tried to get at least the last part (the entire string before .html) and I still get no matches:

Dim matches As MatchCollection
Dim regexStuff As New Regex(".*\\.html")
matches = regexStuff.Matches(strINeed)
Dim successfulMatch As Match
For Each successfulMatch In matches
  strFound = successfulMatch.Value
Next

The match I experimented with, hoping I might even get everything between a dot and an .html: Regex("\\..*\\.html") returned Nothing as well.

I just can't get regular expressions to work...

Upvotes: 0

Views: 463

Answers (2)

user557597
user557597

Reputation:

It could probably be generalized into this

[^.\\]+\.html

Edit: or, initial dot required

\.[^.\\]+\.html

Upvotes: 1

Ghost
Ghost

Reputation: 2226

.*\.(.*?)\.html

This finds as many characters as possible .* until it comes to ( a dot followed by as few characters as possible followed by a dot html ) (\.(.*?)\.html)

It places the stuff between the dot html and the dot preceding the dot html into a capturing group, which should be in $1. If you need the vb.net code for that I can likely get that as well, but your code looked okay

Your vb code should look something like this:

Dim matches As MatchCollection
Dim regexStuff As New Regex(".*\.(.*?)\.html")
matches = regexStuff.Matches(strINeed)
strFound = matches.Item(0).Groups(1).Value.ToString

Upvotes: 1

Related Questions