Reputation: 15
I am trying to filter list of URL's, where some of the URL's having "/" character after domain name (.com or .pl ..etc). I am trying to write PowerShell script to remove any text after "/" from the URL.
Tried below scripts, but didn't worked.
(Get-Content "C:\Work\url123.txt" -Raw) -replace "/" | Set-Content "C:\Work\url12.txt"
// this removes the "/" character and combine the URL's
Input
www.xyz.com
www.abc.com/dummypage/login
www.123.com/login.php?
Expected Output
www.xyz.com
www.abc.com
www.123.com
Upvotes: 0
Views: 705
Reputation: 25021
You can use the following if your URLs don't contain protocols.
(Get-Content "C:\Work\url123.txt") -Replace "(.*?)/.*",'$1'
If you are expected to have protocols in your listings (URIs and URLs), then the following will work:
(Get-Content "C:\Work\url123.txt") -Replace ".*//|(.*?)/.*",'$1'
Since the -Replace
operator uses Regex, I'll explain the syntax.
.*//
: Matches all characters up to and including two forward slashes.|
: Alternative character (OR)(.*?)
: Match as few characters as possible (lazy matching) and store as capture group 1 ($1
)./
: Match forward slash literally$1
: Capture group 1.Upvotes: 3
Reputation: 1140
You can use split:
$a = "ffff/666666/iiii"
$b = $a.Split('/') #is an array with all the substrings separated by /
$b[0] # is the first element
result: 'ffff'
one line: $b = $a.Split('/')[0]
so the code should look like:
(Get-Content "C:\Work\url123.txt" -Raw) | $_.split('/')[0] | Set-Content "C:\Work\url12.txt"
Upvotes: 1