Spikey
Spikey

Reputation: 15

PowerShell to remove text after a special character in URL list

I am trying to filter list of URL's, where some of the URL's having "/" character after domain name (.com or .pl ..etc). I am trying to write PowerShell script to remove any text after "/" from the URL.

Tried below scripts, but didn't worked.

(Get-Content "C:\Work\url123.txt" -Raw) -replace "/" | Set-Content "C:\Work\url12.txt"

// this removes the "/" character and combine the URL's

Input

www.xyz.com

www.abc.com/dummypage/login

www.123.com/login.php?

Expected Output

www.xyz.com

www.abc.com

www.123.com

Upvotes: 0

Views: 705

Answers (2)

AdminOfThings
AdminOfThings

Reputation: 25021

You can use the following if your URLs don't contain protocols.

(Get-Content "C:\Work\url123.txt") -Replace "(.*?)/.*",'$1'

If you are expected to have protocols in your listings (URIs and URLs), then the following will work:

(Get-Content "C:\Work\url123.txt") -Replace ".*//|(.*?)/.*",'$1'

Since the -Replace operator uses Regex, I'll explain the syntax.

  • .*//: Matches all characters up to and including two forward slashes.
  • |: Alternative character (OR)
  • (.*?): Match as few characters as possible (lazy matching) and store as capture group 1 ($1).
  • /: Match forward slash literally
  • $1: Capture group 1.

Upvotes: 3

TudorIftimie
TudorIftimie

Reputation: 1140

You can use split:

$a = "ffff/666666/iiii"
$b = $a.Split('/') #is an array with all the substrings separated by /
$b[0] # is the first element 

result: 'ffff'

one line: $b = $a.Split('/')[0]

so the code should look like:

(Get-Content "C:\Work\url123.txt" -Raw) | $_.split('/')[0] | Set-Content "C:\Work\url12.txt"

Upvotes: 1

Related Questions