Reputation: 91
Here is my sample data:
Option failonnomatch on
Option batch on
Option confirm Off
open sftp://username:[email protected]:22 hostkey="ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00"get File*.txt \local\path\Client\File.txt
mv File*.txt /remote/archive/close
exit
I would like to create a powershell script to extract pieces of information out of this text file.
List of items I need:
I'm hoping that if I learn how to do a couple of these, the method will be applicable to all items. I attempted to extract the ssh key with the following powershell/regex:
$doc -match '(?<=hostkey=")(.*)(?=")'
$doc being the sample data
but it appears to be returning the whole line. Any help would be greatly appreciated. Thank you.
Upvotes: 2
Views: 4856
Reputation: 437111
If -match
is returning a whole line, the implication is that the LHS of your -match
operation is an array, which in turn suggests that you used Get-Content
without -Raw
, which yields the input as an array of lines, in which case -match
acts as a filter.
Instead, read your file as a single, multi-line string with Get-Content -Raw
; with a scalar LHS, -match
then returns a [bool]
, and the results of the matching operation are reported in automatic variable $Matches
(a hashtable whose 0
entry contains the overall match, 1
what the 1st capture group matched, ...):
# Read file as a whole, into a single, multi-line string.
$doc = Get-Content -Raw file.txt
if ($doc -match '(?<=hostkey=")(.*)(?=")') {
# Output what the 1st capture group captured
$Matches[1]
}
With your sample input, the above yields
ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
You can then extend the approach to capture multiple tokens, in which case I suggest using named capture groups ((?<name>...)
); the following example uses such named capture groups to extract several of the tokens of interest:
if ($doc -match '(?<=sftp://)(?<username>[^:]+):(?<password>[^@]+)@(?<host>[^:]+)'){
# Output the named capture-group values.
# Note that index notation (['username']) and property
# notation (.username) can be used interchangeably.
$Matches.username
$Matches.password
$Matches.host
}
With your sample input, the above yields:
username
password
host.name.net
You can extend the above to capture all tokens of interest.
Note that .
by default doesn't match \n
(newline) characters.
x
(IgnoreWhiteSpace
) option to make regexes more readable:Extracting that many tokens can result in a complex regex that is hard to read, in which case the x
(IgnoreWhiteSpace
) regex option, can help (as an inline option, (?x)
at the start of the regex):
if ($doc -match '(?x)
(?<=sftp://)(?<username>[^:]+)
:(?<password>[^@]+)
@(?<host>[^:]+)
:(?<port>\d+)
\s+hostkey="(?<sshkey>.+?)"
\n+get\ File\*\.txt\ (?<localpath>.+)
\nmv\ File\*\.txt\ (?<remotepath>.+)
'){
# Output the named capture-group values.
$Matches.GetEnumerator() | ? Key -ne 0
}
Note how the whitespace used for making the regex more readable (spreading it across multiple lines) is ignored while matching, whereas whitespace to be matched in the input must be escaped (e.g., to match a single space, \
or [ ]
, or \s
to match any whitespace char.)
With your sample input, the above yields the following:
Name Value
---- -----
host host.name.net
localpath \local\path\Client\File.txt
port 22
sshkey ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
remotepath /remote/archive/
password password
username username
Note that the reason the capture groups are out of order is that $Matches
is a hash table (of type [hashtable]
), whose key enumeration order is an implementation artifact: no particular enumeration order is guaranteed.
However, random access to capture groups works just fine; e.g., $Matches.port
will return 22
.
Upvotes: 1
Reputation: 7479
this uses named matches with flags set to singleline, multiline, case insensitive
and then uses $Matches.MatchName
to get the items into a custom object.
# fake reading in a text file as one string
# in real life, use Get-Content -Raw
$InStuff = @'
open sftp://username:[email protected]:22 hostkey="ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00"
get File*.txt \SERVER\Path\Client\File.txt
'@
$Null = $InStuff -match '(?smi).+//(?<UserName>.+):(?<Password>.+)@(?<HostName>.+):(?<Port>.+) hostkey="(?<SshKey>.+)".+get .+ (?<FullFileName>\\.+)$'
[PSCustomObject]@{
UserName = $Matches.UserName
Password = $Matches.Password
Port = $Matches.Port
SshKey = $Matches.SshKey
PathName = Split-Path -Path $Matches.FullFileName -Parent
FileName = Split-Path -Path $Matches.FullFileName -Leaf
}
output ...
UserName : username
Password : password
Port : 22
SshKey : ssh-rsa 1024 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
PathName : \SERVER\Path\Client
FileName : File.txt
Upvotes: 1