
Reputation: 768

Split and regex match with Powershell

Say I have a filename string, something like:


Where "test" could be any combination of white space, characters, numbers, etc. I wish to extract the 19000101_010101 part (date and time) with Powershell. Currently I am assigning -split "_ABC_" to a variable and taking the second element of the array. I am then splitting this string subsequent times. Is there a way to accomplish this in one go?


"_ABC_" is constant, occurring unchanged in all instances of filename(s).

Upvotes: 2

Views: 918

Answers (3)


Reputation: 440162

A more concise - albeit perhaps more obscure - alternative to Santiago Squarzon's helpful answer:

# Construct a regex that consumes the entire file name while
# using capture groups for the parts of interest.
$re = '.+_ABC_(\d{4})(\d{2})(\d{2})_(\d{2})(\d{2})(\d{2})\.(\d{3})\..+'

[datetime] (
  # In the replacement string, use $1, $2, ... to refer to what the
  # first, second, ... capture group captured.
  'test_ABC_19000101_010101.987.txt' -replace $re, '$1-$2-$3T$4:$5:$6.$7'


Monday, January 1, 1900 1:01:01 AM

The -replace operation results in string '1900-01-01T01:01:01.987', which is a (culture-invariant) format that you can use as-is with a [datetime] cast.

Note that with a Get-ChildItem call as input you could slightly simplify the regex by providing $_.BaseName rather than $_.Name as the -replace LHS, which obviates the need to also match the extension (.\.+) in the regex.

An aside re the [datetime] cast: [datetime] '...' results in a [datetime] instance that is an unspecified timestamp (its .Kind property value is Unspecified), i.e. it is undefined whether it represents as Local or a Utc timestamp.

To get a Local timestamp, use
[datetime]::Parse('...', [cultureinfo]::InvariantCulture, 'AssumeLocal')
(use 'AssumeLocal, AdjustToUniversal' to get a Utc timestamp).

Alternatively, you can cast to [datetimeoffset] - a type that is generally preferable to [datetime] - which interprets a string cast to it as local by default. (You can then access its .LocalDateTime / .UtcDateTime properties to get Local / Utc [datetime] instances).

Upvotes: 3


Reputation: 16266

If there will never be multiple sequences in the filename that appear as the timestamp (8 digits, _, 6 digits, then you could match on that pattern of digits.

PS C:\> 'test_ABC_19000101_010101.987.txt' -match '^.*ABC_(\d{8}_\d{6})\..*'
PS C:\> $Matches

Name                           Value
----                           -----
1                              19000101_010101
0                              test_ABC_19000101_010101.987.txt

PS C:\> $Matches[1]

You would use the filename instead of the explicit string.

If you want to get a [System.DateTime] from it:

PS C:\> [datetime]::ParseExact($Matches[1], 'yyyyMMdd_HHmmss', $null)

Monday, January 1, 1900 01:01:01

Upvotes: 2

Santiago Squarzon
Santiago Squarzon

Reputation: 61103

This regex seems an overkill but I think it should work, as long as _ABC_ is constant and there is a _ to separate the date from the time and a . to separate time from milliseconds:

$re = [regex]'(?<=_ABC_)(?<date>\d*)_(?<time>\d*)\.(?<millisec>\d*)(?=\.)'

t' az@ 0est_ABC_20000101_090101.123.txt
te098d $st_ABC_22000101_070101.789.txt
'@ -split '\r?\n' | ForEach-Object {

    $groups = $re.Match($_).Groups
    $date = $groups['date']
    $time = $groups['time']
    $msec = $groups['millisec']

        "$date $time $msec",
        "yyyyMMdd HHmmss fff",

See https://regex101.com/r/8oSpqf/1 for details.

Upvotes: 2

Related Questions