mack
mack

Reputation: 2965

Powershell Regex for mm-dd-yyyy

I am using Powershell to search through a large file to find all strings that contain anything in mm-dd-yyyy format. I then need to extract the string to determine if the date is a valid date. The script works for the most part but is returns too many results and doesn't provide all the info I would like. There are strings in the file like 012-34-5678 and for this I would get a failure on and the value of 12-34-5678 would be returned as an invalid date. I'm also not able to return the line number that the invalid date was found on. Can someone please take a look at my script below and see what I may be doing wrong?

The two commented out lines will return the string number and the entire string that was found on that line, but I do not know how to take just the mm-dd-yyyy part from the line and determine if it is a valid date.

Any help would be greatly appreciatedd. Thanks.

#$matches = Select-String -Pattern $regex -AllMatches -Path "TestFile_2013_01_06.xml" | 

#$matches | Select LineNumber,Line


$regex = "\d{2}-\d{2}-\d{4}"     

$matches = Select-String -Pattern $regex -AllMatches -Path "TestFile_2013_01_06.xml" |
   Foreach {$_.Matches | Foreach {$_.Groups[0] | Foreach {$_.Value}}}

foreach ($match in $matches) {

    #$date = [datetime]::parseexact($match,"MM-dd-yyyy",$null)  

    if (([Boolean]($match -as [DateTime]) -eq $false ) -or ([datetime]::parseexact($match,"MM-dd-yyyy",$null).Year -lt "1800")) {
        write-host "Failed $match"
    }
}

Upvotes: 2

Views: 17650

Answers (3)

BartekB
BartekB

Reputation: 8660

I would probably just try to link result of Select-String and actual matches. I haven't included condition that checks if date is "new" enough:

Select-String -Pattern '\d{2}-\d{2}-\d{4}' -Path TestFile_2013_01_06.xml -AllMatches | 
    ForEach-Object {
        $Info = $_ | 
            Add-Member -MemberType NoteProperty -Name Date -Value $null -PassThru |
            Add-Member -MemberType NoteProperty -Name Captured -Value $null -PassThru
        foreach ($Match in $_.Matches) {
            try {
                $Date = [DateTime]::ParseExact($Match.Value,'MM-dd-yyyy',$null)
            } catch {
                $Date = 'NotValid'
            } finally {
                $Info.Date = $Date
                $Info.Captured = $Match.Value
                $Info
            }
        }
    } | Select Line, LineNumber, Date, Captured

When I tried it on some sample data I got smth like that:

Line                                  LineNumber Date                Captured  
----                                  ---------- ----                --------  
Test 12-12-2012                                1 2012-12-12 00:00:00 12-12-2012
Test another 12-40-2030                        2 NotValid            12-40-2030
20-20-2020 And yet another 01-01-1999          3 NotValid            20-20-2020
20-20-2020 And yet another 01-01-1999          3 1999-01-01 00:00:00 01-01-1999

Upvotes: 0

Keith Hill
Keith Hill

Reputation: 202072

The line number is available on the object that Select-String outputs but you're not capturing it in $matches. Try this:

$matchInfos = @(Select-String -Pattern $regex -AllMatches -Path "TestFile_2013_01_06.xml")
foreach ($minfo in $matchInfos)
{
    #"LineNumber $($minfo.LineNumber)"
    foreach ($match in @($minfo.Matches | Foreach {$_.Groups[0].value}))
    {
        if ($match -isnot [DateTime]) -or 
            ([datetime]::parseexact($match,"MM-dd-yyyy",$null).Year -lt "1800")) {
          Write-host "Failed $match on line $($minfo.LineNumber)"
        }
    }
 }

Upvotes: 2

Ken White
Ken White

Reputation: 125757

You can do a lot of the validation in the regex itself, by making it more robust:

$regex = "(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)[0-9]{2}"

The above matches any dates between 01/01/1900 through 12/31/2099, and accepts forward slashes, dashes, spaces, and dots as the date separator. It does not reject invalid dates like February 30 or 31, November 31, etc.

Upvotes: 6

Related Questions