Arbelac
Arbelac

Reputation: 1904

Extract specific multiple string from lines via regex

I have been trying to extract certain values from multiple lines inside a .txt file with PowerShell. I have a huge file with all the backup and trying to extract all those lines.

Txt file:

Backup-ID:           hostname01
Policy:              VM_weekly
Primary Copy:        23
Expires:             1/5/2024 3:19:13 AM
Type:                4


Copy Number:        2
Fragment Size (KB): 6188832
Expires:            1/5/2024 3:19:13 AM
MediaID:            XXX122
TestID:             1222
Block:              33


Copy Number:        3
Fragment Size (KB): 6188832
Expires:            1/5/2024 3:19:13 AM
MediaID:            XXX134
TestID:             223
Block:              22
Duplicate:          N



Backup-ID:           hostname02
Policy:              VM_weekly2
Primary Copy:        24
Expires:             1/5/2024 3:19:13 AM
Type:                2


Copy Number:        2
Fragment Size (KB): 6188832
Expires:            1/5/2024 3:19:13 AM
MediaID:            XXX244
Comp:               BBB
Block:              45
Duplicate:          N


Copy Number:        3
Fragment Size (KB): 6188832
Expires:            1/5/2024 3:19:13 AM
MediaID:            XXX199
Comp:               AA
Block:              334

Copy Number:        4
Fragment Size (KB): 6188832
Expires:            1/5/2024 3:19:13 AM
MediaID:            XXX177

I have code so far :

Get-Content C:\test.txt | Select-String -Pattern 'Backup-ID: ' ,'Policy: ' ,'Primary Copy: ' ,'Expires:  ' ,'Copy Number: ' , 'Fragment Size ' ,'Expires: ' , 'MediaID:'

This is what I want :

hostname01,VM_weekly,23,6188832,1/5/2024 3:19:13 AM,XXX122,3,6188832,1/5/2024 3:19:13 AM,XXX134
hostname02,VM_weekly2,24,1/5/2024 3:19:13 AM,2,6188832,1/5/2024 3:19:13 AM,XXX244,3,6188832,1/5/2024 3:19:13 AM,XXX199,4,6188832,1/5/2024 3:19:13 AM,XXX177

Upvotes: 2

Views: 134

Answers (3)

JosefZ
JosefZ

Reputation: 30113

Here's my old-school approach:

$line = ''
Get-Content C:\test.txt | 
    Select-String -Pattern 'Backup-ID: ' ,'Policy: ' ,'Primary Copy: ' ,'Expires:  ' ,'Copy Number: ' , 'Fragment Size ' ,'Expires: ' , 'MediaID:' |
        ForEach-Object {
            $aux = $_  -split ':',2            # only 2 substrings
            if ($aux[0] -eq 'Backup-ID') {
                if ( $line -ne '' ) { $line }  # Write-Output (current line)
                $line = $aux[1].Trim()
            } else {
                $line += ',' + $aux[1].Trim()
            }
        }
        $line                                   # Write-Output (last line)

Output:

D:\PShell\SO\54921319.ps1
hostname01,VM_weekly,23,1/5/2024 3:19:13 AM,2,6188832,1/5/2024 3:19:13 AM,XXX122,3,6188832,1/5/2024 3:19:13 AM,XXX134
hostname02,VM_weekly2,24,1/5/2024 3:19:13 AM,2,6188832,1/5/2024 3:19:13 AM,XXX244,3,6188832,1/5/2024 3:19:13 AM,XXX199,4,6188832,1/5/2024 3:19:13 AM,XXX177

Edit: … I need to export CSV file ….

$xArr = D:\PShell\SO\54921319.ps1
$xCsv = $xArr |  ConvertFrom-Csv -Header $(1..30|%{"a$_"})
$xcsv | Export-Csv -NoTypeInformation -Path c:\temp\result.csv

Of course, you can compute

  • actual upper limit for -Header $(1..30|%{"a$_"}) instead of estimated 30 e.g. as ($xArr | % {$_.Split(',').Count}|Measure-Object -Maximum).Maximum,
  • or even compute some human-readable headers (having in mind recurring names of some properties for each Copy Number inside given Backup-ID)

Upvotes: 2

marsze
marsze

Reputation: 17055

This maybe?

& {
    $current = $null
    switch -regex -file 'C:\text.txt' {
        '^(Backup-ID|Policy|Primary Copy|Expires|Copy Number|Fragment Size \(KB\)|Expires|MediaID):\s+(.*)' {
            if ($matches[1] -eq "Backup-ID") {
                if ($current) { $current.ToString() }
                $current = [Text.StringBuilder]::new()
                [void]$current.Append($matches[2])
            }
            else {
                [void]$current.Append(",").Append($matches[2])
            }
        }
    }
    $current.ToString()
}

Upvotes: 1

user6811411
user6811411

Reputation:

Using a better Pattern

 $Pattern = '^Backup-ID|^Policy|^Primary Copy|^Expires|^Copy Number|^Fragment Size|^Expires|^MediaID'

and RegEx to split the output at Backup-ID

(Get-Content .\test.txt|Select-String -Pattern $Pattern|Out-String) -split "(?=Backup-ID)"|ForEach-Object {
    (($_ -split "`r?`n" | %{($_ -split ":\s+",2)[1]}) -join ',').Trim(',')
}

hostname01,VM_weekly,23,1/5/2024 3:19:13 AM,2,6188832,1/5/2024 3:19:13 AM,XXX122,3,6188832,1/5/2024 3:19:13 AM,XXX134
hostname02,VM_weekly2,24,1/5/2024 3:19:13 AM,2,6188832,1/5/2024 3:19:13 AM,XXX244,3,6188832,1/5/2024 3:19:13 AM,XXX199,4,6188832,1/5/2024 3:19:13 AM,XXX177

Upvotes: 2

Related Questions