drd0sPy
drd0sPy

Reputation: 63

PowerShell remove dot in regex

I have the following one-liner in powershell:

cat raw.txt | select-string -Pattern "\A[s]\w{1,12}\.\b" -AllMatches | % { $_.Matches } | % { $_.Value }

Returns:

saltri.
swoptimusprime.
swdecepticons.

The question is how to remove the dots "." from my lines of hostnames.

Thanks in advance

Upvotes: 1

Views: 1613

Answers (2)

mklement0
mklement0

Reputation: 437638

PetSerAl, in a comment on the question, provided the crucial pointer: use a positive lookahead assertion ((?=...)) to match an additional part of the input, without including that part in the captured match.

If we apply this to your solution and simplify it, we get:

Get-Content raw.txt | % { if ($_ -match '^s\w{1,12}(?=\.\b)') { $matches[0] } }

The sub-expression \.\b - a literal . followed by a (non-capturing) transition to a word character, \b - is matched, but not captured; that is, $matches[0], the element of the special $matches variable that contains the matched part of the string, does not include the .

However, since we're using -match and accessing the special $matches variable afterward, we may simplify matters with a capture group ((...)) in the regex, whose captured substring we can access by index 1, since it is the first (and only) capture group in the regex:

Get-Content raw.txt | % { if ($_ -match '^(s\w{1,12})\.\b') { $matches[1] } }

Notes on your solution attempt (aside from including the . in the match):

  • You're using Get-Content without switch -Raw, which means that the input lines are sent individually through the pipeline:

    • Therefore, there's no reason to use \A instead of the more familiar start-of-string/line anchor ^, because these two only differ with multi-line input.
    • Because you're anchoring the match at the start of the line, the -AllMatches option is pointless, because by definition there can be at most 1 match per line.
  • As you can see, a single % (ForEach-Object) block with -match is sufficient in this case and simplifies matters; it both returns less information not needed in this case and performs better than Select-String.

Upvotes: 1

restless1987
restless1987

Reputation: 1598

As I dont know how your Text looks like, grouping the hostname should be sufficent (shorter attempt via -match):

cat raw.txt | % {$_ -match "\A([s]\w{1,12})\.\b"; $matches[1] }

Upvotes: 0

Related Questions