Michael S Palatsi
Michael S Palatsi

Reputation: 91

RegEx to match string between two strings in Powershell

Here is my sample data:

Option failonnomatch on
Option batch on
Option confirm Off
open sftp://username:[email protected]:22 hostkey="ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00"

get File*.txt \local\path\Client\File.txt
mv File*.txt /remote/archive/

close
exit

I would like to create a powershell script to extract pieces of information out of this text file.

List of items I need:

I'm hoping that if I learn how to do a couple of these, the method will be applicable to all items. I attempted to extract the ssh key with the following powershell/regex:

$doc -match '(?<=hostkey=")(.*)(?=")' 

$doc being the sample data

but it appears to be returning the whole line. Any help would be greatly appreciated. Thank you.

Upvotes: 2

Views: 4856

Answers (2)

mklement0
mklement0

Reputation: 437111

If -match is returning a whole line, the implication is that the LHS of your -match operation is an array, which in turn suggests that you used Get-Content without -Raw, which yields the input as an array of lines, in which case -match acts as a filter.

Instead, read your file as a single, multi-line string with Get-Content -Raw; with a scalar LHS,
-match then returns a [bool]
, and the results of the matching operation are reported in automatic variable $Matches (a hashtable whose 0 entry contains the overall match, 1 what the 1st capture group matched, ...):

# Read file as a whole, into a single, multi-line string.
$doc = Get-Content -Raw file.txt 

if ($doc -match '(?<=hostkey=")(.*)(?=")') {
   # Output what the 1st capture group captured
   $Matches[1]
}

With your sample input, the above yields
ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00


You can then extend the approach to capture multiple tokens, in which case I suggest using named capture groups ((?<name>...)); the following example uses such named capture groups to extract several of the tokens of interest:

if ($doc -match '(?<=sftp://)(?<username>[^:]+):(?<password>[^@]+)@(?<host>[^:]+)'){
  # Output the named capture-group values.
  # Note that index notation (['username']) and property
  # notation (.username) can be used interchangeably.
  $Matches.username
  $Matches.password
  $Matches.host
}

With your sample input, the above yields:

username
password
host.name.net

You can extend the above to capture all tokens of interest.
Note that . by default doesn't match \n (newline) characters.


Optional reading: Using the x (IgnoreWhiteSpace) option to make regexes more readable:

Extracting that many tokens can result in a complex regex that is hard to read, in which case the x (IgnoreWhiteSpace) regex option, can help (as an inline option, (?x) at the start of the regex):

if ($doc -match '(?x)
    (?<=sftp://)(?<username>[^:]+)
    :(?<password>[^@]+)
    @(?<host>[^:]+)
    :(?<port>\d+)
    \s+hostkey="(?<sshkey>.+?)"
    \n+get\ File\*\.txt\ (?<localpath>.+)
    \nmv\ File\*\.txt\ (?<remotepath>.+)
  '){
    # Output the named capture-group values.
    $Matches.GetEnumerator() | ? Key -ne 0
}

Note how the whitespace used for making the regex more readable (spreading it across multiple lines) is ignored while matching, whereas whitespace to be matched in the input must be escaped (e.g., to match a single space, or [ ], or \s to match any whitespace char.)

With your sample input, the above yields the following:

Name                           Value
----                           -----
host                           host.name.net
localpath                      \local\path\Client\File.txt
port                           22
sshkey                         ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
remotepath                     /remote/archive/
password                       password
username                       username

Note that the reason the capture groups are out of order is that $Matches is a hash table (of type [hashtable]), whose key enumeration order is an implementation artifact: no particular enumeration order is guaranteed.

However, random access to capture groups works just fine; e.g., $Matches.port will return 22.

Upvotes: 1

Lee_Dailey
Lee_Dailey

Reputation: 7479

this uses named matches with flags set to singleline, multiline, case insensitive and then uses $Matches.MatchName to get the items into a custom object.

# fake reading in a text file as one string
#    in real life, use Get-Content -Raw
$InStuff = @'
open sftp://username:[email protected]:22 hostkey="ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00"

get File*.txt \SERVER\Path\Client\File.txt
'@

$Null = $InStuff -match '(?smi).+//(?<UserName>.+):(?<Password>.+)@(?<HostName>.+):(?<Port>.+) hostkey="(?<SshKey>.+)".+get .+ (?<FullFileName>\\.+)$'

[PSCustomObject]@{
    UserName = $Matches.UserName
    Password = $Matches.Password
    Port = $Matches.Port
    SshKey = $Matches.SshKey
    PathName = Split-Path -Path $Matches.FullFileName -Parent
    FileName = Split-Path -Path $Matches.FullFileName -Leaf
    }

output ...

UserName : username
Password : password
Port     : 22
SshKey   : ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
PathName : \SERVER\Path\Client
FileName : File.txt

Upvotes: 1

Related Questions