Reputation: 20511
I have this string (hundreds of them actually) containing URLs and I would like to update them.
Here's the old URL format
http://oldDomain/a/b/document.aspx?p1=v1&p2=NEEDED_VALUE&morePsHere=moreVsHere
and here's what I need them to look like after the update
http://newDomain/c/d/NEEDED_VALUE
Pretty much all I needed to do was to extract the value of p2
in the old URL and append it to http://newDomain/c/d/
to create the new URL.
I assumed the string I was going to get would look like this:
$s = "http://oldDomain/a/b/document.aspx?p1=v1&p2=001&morePsHere=moreVsHere,
http://oldDomain/a/b/document.aspx?p1=v1&p2=002&morePsHere=moreVsHere,
http://oldDomain/a/b/document.aspx?p1=v1&p2=003&morePsHere=moreVsHere"
and I was able to update it using the following:
$newURLStart = "http://newDomain/c/d/"
$newStr = $null
$s.Split(",") | ForEach {
if ($_.IndexOf("p2=") -ne 1)
{
$neededValue = $_.Substring($_.IndexOf("p2=")+3)
if ($neededValue.IndexOf("&") -ne -1)
{
$neededValue = $neededValue.Substring(0,$neededValue.IndexOf("&"))
}
$newStr = $newStr + ", " + $newURLStart + $neededValue
}
}
$newStr = $newStr.TrimStart(", ")
$s = $newStr
BUT, it turns out that the string I'm going to get isn't plaintext and would actually look something like:
$s = '<div class="someClass"><p>SomeText</p><ul>
<li><a href="http://oldDomain/a/b/document.aspx?p1=v1&p2=001&morePsHere=moreVsHere">LINK ONE</a></li>
<li><a href="http://oldDomain/a/b/document.aspx?p1=v1&p2=002&morePsHere=moreVsHere">LINK TWO</a></li>
<li><a href="http://oldDomain/a/b/document.aspx?p1=v1&p2=003&morePsHere=moreVsHere">LINK THREE</a></li>
</ul></div>'
This is a bit more complex than my comma-delimited expectations! I need help updating my script to accommodate the fact. I'm thinking regex might come into play here to grab the URLs inside the href
but I'm pretty noob when it comes to that.
Upvotes: 2
Views: 4015
Reputation: 52185
If you threw all the strings in a file you could do something like so:
Get-Content "testregex.html" | % {$_ -replace 'href=".+?;.+?=(.+?)&(.+?)"', 'href="http://newdomain/c/$1"'} | Set-Content "newtestregex.html"
Takes as input this file:
<div class="someClass"><p>SomeText</p><ul>
<li><a href="http://oldDomain/a/b/document.aspx?p1=v1&p2=001&morePsHere=moreVsHere">LINK ONE</a></li>
<li><a href="http://oldDomain/a/b/document.aspx?p1=v1&p2=002&morePsHere=moreVsHere">LINK TWO</a></li>
<li><a href="http://oldDomain/a/b/document.aspx?p1=v1&p2=003&morePsHere=moreVsHere">LINK THREE</a></li>
</ul></div>
Yields:
<div class="someClass"><p>SomeText</p><ul>
<li><a href="http://newdomain/c/001">LINK ONE</a></li>
<li><a href="http://newdomain/c/002">LINK TWO</a></li>
<li><a href="http://newdomain/c/003">LINK THREE</a></li>
</ul></div>
Upvotes: 1
Reputation: 2164
I simplified your input somewhat, but here it is. (BTW please please store this regex in a post-it next to your desk - it helps me again and again! :) )
I make the following assumptions:
Code:
# Heres the input.
# I assume you can figure out how to extract the <li> tags from your input
$ip = '<li><a href="http://oldDomain/a/b/document.aspx?p1=v1&p2=001&morePsHere=moreVsHere">LINK ONE</a></li>
<li><a href="http://oldDomain/a/b/document.aspx?p1=v1&p2=002&morePsHere=moreVsHere">LINK TWO</a></li>
<li><a href="http://oldDomain/a/b/document.aspx?p1=v1&p2=003&morePsHere=moreVsHere">LINK THREE</a></li>
'
# loop through each line.
$ip -split "`n" | foreach {
$_ -match "(?<=p2=).*(?=&)"
$matches
# now insert the logic to put the regex match into your destination URL
}
More info on the regex used (and a web result):
-match
operator puts the regex match in a variable called $matches. (?<=p2=)
and (?=&)
tell Powershell that it should look for a match that is bounded by the expressions p2=
and &
. In this case its your match. Heres the output for $match
Name Value
---- -----
0 001
0 002
0 003
0 003
Upvotes: 1
Reputation: 12603
You can make this a bit easier by using Powershell's excellent XML capabilities. First, convert your string into xml: $xmlData = [xml] $s
. Now, we can simply navigate it using properties: $xmlData.div.ul.li.a.href
will go into the html you got, and automatically expand into collections as needed:
PS C:\Users\carlpett> $xmlData.div.ul.li.a.href
http://oldDomain/a/b/document.aspx?p1=v1&p2=001&morePsHere=moreVsHere
http://oldDomain/a/b/document.aspx?p1=v1&p2=002&morePsHere=moreVsHere
http://oldDomain/a/b/document.aspx?p1=v1&p2=003&morePsHere=moreVsHere
Now, it's just a simple regex to do the actual replacement: $xmlData.div.ul.li.a.href -replace 'http:\/\/oldDomain\/.+p2=([^&]+).+','http://newDomain/c/d/$1'
So, wrapping it up:
$xmlData = [xml] $s
$xmlData.div.ul.li.a.href -replace 'http:\/\/oldDomain\/.+p2=([^&]+).+','http://newDomain/c/d/$1'
Upvotes: 1