Reputation:
This string is automatically generated with an application I can't access or change: "http://www.site.com/locale=euen&mag=testit&issue=322&page=5&template=testit-t1"
I need to change the string to
where:
But it could also be a different issue or page number, it's not known beforehand.
How do I do this in VB.NET? (It has to be VB.NET) I've tried things with split and compare, but I'm a disaster with dissecting strings. Help would be most welcome!
EDIT:
after trying the solution of Konrad below, I get an error when I try to run the string through it. All the other URLs keep working fine, but as soon as I put one in in the format that needs to be converted, it errs.
I suspect this is because the conversion function is part of yet another function, and I'm doing something wrong when trying to put the regex function in. This is the complete function:
Function ExpandLine(ByRef sLine, ByVal nStart)
'Purpose: adapt expandLine into a funciton that replaces
' ' the urls form the UNIT with redirects
' '
' ' Purpose: This function searches recursively
' ' for strings embedded in "{" and "}" pairs.
' ' These strings contain a left and right part
' ' separated by ";". The left part will be
' ' hyperlinked with the right part.
' '
' ' Input: sLine - string to be expanded
' ' nStart - where to start the expansion from
' ' the right (normally set to -1)
' '
' ' Output: sLine - expanded string
' '
' ' Example: This line contains a {hyperlink;http://www.site.com}
' ' that points to the homepage
Dim n, n1, n2 As Integer
Dim sUrl As String
If nStart <> 0 Then
n = InStrRev(sLine, "{", nStart)
If n <> 0 Then
n1 = InStr(n, sLine, ";")
n2 = InStr(n, sLine, "}")
If Not (n1 = 0 Or n2 = 0) Then
sUrl = Mid(sLine, n1 + 1, n2 - n1 - 1)
'use RegEx to determine if its an UNIT url
Const TestPattern = _
"^http://[^/]+/locale=[^&]+&mag=[^&]+&issue=[^&]+&page=[^&]+&template=[^&]+$"
Dim conformsToPattern = Regex.IsMatch(sUrl, TestPattern)
If conformsToPattern Then
Const SitePattern = "(http://[^/]+)/"
Const IssuePattern = "issue=(\d+)"
Const PagePattern = "page=(\d+)"
Dim sSite = Regex.Match(sUrl, SitePattern).Groups(1).Value
Dim sIssue = Regex.Match(sUrl, IssuePattern).Groups(1).Value
Dim sPage = Regex.Match(sUrl, PagePattern).Groups(1).Value
sUrl = String.Format("{1}/{2}_{3}", sSite, sIssue, sPage)
End If
sLine = _
Left(sLine, n - 1) & "<a class=""smalllink"" target=""_new"" href=""" & _
sUrl & """>" & Mid(sLine, n + 1, n1 - n - 1) & "</a>" & _
Right(sLine, Len(sLine) - n2)
ExpandLine(sLine, n - 1)
End If
End If
End If
End Function
Is the problem in the following line?
sUrl = String.Format("{1}/{2}_{3}", sSite, sIssue, sPage)?
Upvotes: 2
Views: 1201
Reputation: 545588
You want regular expressions:
Const SitePattern = "(http://[^/]+)/"
Const IssuePattern = "issue=(\d+)"
Const PagePattern = "page=(\d+)"
Dim site = Regex.Match(input, SitePattern).Groups(1).Value
Dim issue = Regex.Match(input, IssuePattern).Groups(1).Value
Dim page = Regex.Match(input, PagePattern).Groups(1).Value
Dim result = String.Format("{1}/{2}_{3}", site, issue, page)
This searches, respectively, for the name of the website domain (including the leading http://
, and delimited by the first following forward slash), the number that follows after the issue
parameter and the number that follows after the page
parameter.
It then constructs the result string from these three findings.
Searching for numbers in regular expressions is done via \d+
, where \d
matches any digit, and +
tells the engine to match at least one, and arbitrarily many.
For the web site, we allow any character, except the forward slash ([^/]
– this is a character group and the leading ^
tells the engine to negate the group, i.e. match everything not in it).
EDIT: If you first want to test whether the input actually conforms to your pattern, you may do the following. Notice, however, that this test is sensitive to the order of the GET parameters and I’d take this as a warning sign to do it differently: since the order of GET parameters in a URL isn’t important, can you guarantee that it will stay the same?
Const TestPattern = "^http://[^/]+/locale=[^&]+&mag=[^&]+&issue=[^&]+&page=[^&]+&template=[^&]+$"
Dim conformsToPattern = Regex.IsMatch(input, TestPattern)
If conformsToPattern Then
' Yes, go ahead. '
Else
' Nope, leave it unchanged. '
End If
This just checks that the whole string (from start = ^
to end = $
) is matched by the pattern. The variable parameter values are all encoded as [^&]+
, i.e. several characters ≠ &
(which is the delimiter of the parameters).
Upvotes: 2