RASmith
RASmith

Reputation: 3

Powershell text File to CSV file conversion line by line

Task: Convert screen scraped text into csv for import into Excel. Environment: Windows 11, Notepad, Powershell ISE, Powershell v5.1 Text format: Single line per entry with CR LF endings, checked with Scite. Text to convert has repeating 7 line pattern ending in line containing a unique word followed by empty (CR LF) lines. Number of empty lines is inconsistent.

My Solution: Read each line. If line contains unique word, append line to file maintaining CR LF. If line is blank do nothing. If line does not contain unique word, convert CR LF to comma and append to out file. Input file sample:

28
--
123.123.123.123 - fgh1
00:00:00:00:00:00
someword
 anotherword
INDEX word: 11


29
--
123.123.123.124 - fgh1
00:00:00:00:00:00
someword "INDEX
 anotherword
INDEX word: 11

Output file format desired:

28,--,123.123.123.123 - fgh1,00:00:00:00:00:00,someword, anotherword,INDEX word: 11
29,--,123.123.123.124 - fgh1,00:00:00:00:00:00,someword, anotherword,INDEX word: 11

This PS script recognizes the unique word(s):

foreach($line in [System.IO.File]::ReadLines(C:\InFile.txt))
{
 If ($line -match "INDEX" {Write-Host "Line contains INDEX"}
 ElseIf ($line -notmatch "INDEX") (Write-Host "Line does not contain INDEX"}
 }

However, I cannot get the following script to replace the CR LF with commas and produce the file format I desire. The script just writes out the original file format to the OutFile.

foreach($line in [System.IO.File]::ReadLines(C:\InFile.txt))
{
 If ($line -match "INDEX" {Out-File -append -FilePath C:\OutFile.txt -inputobject $line}
 ElseIf ($line -notmatch "INDEX") (Out-File -append -filepath C:\OutFile.txt -inputobject ($line -replace '\r?\n',',')}
 }

Note on why I'm using the [System.IO.File]::ReadLines method: I couldn't get proper unique word matches using:

$FilePath = "C:Infile.txt"
$FileContents = Get-Content -Path $FilePath -Raw | Out-String #Tried with or without "Out-String"
ForEach ($Line in $FileContents){
If ($Line -match -match "INDEX" {Write-Host "Line contains INDEX"}
ElseIf ($line -notmatch "INDEX") (Write-Host "Line does not contain INDEX"}
 }

Upvotes: 0

Views: 197

Answers (2)

Theo
Theo

Reputation: 61208

You could parse the data from such a file like below:

$fileIn  = 'C:\inFile.txt'
$fileOut = 'C:\OutFile.txt'
$headers = (1..7 | ForEach-Object { "Column$_" } ) -join ','     # create a header line for the csv file

# read the file as single multiline string and parse the data
$data = (Get-Content -Path $fileIn -Raw) -split '(\r?\n){2,}' |  # split the text on the multiple Newlines
Where-Object { $_ -match '\S' } |                                # ignore empty or whitespace-only parts
ForEach-Object { ($_ -split '\r?\n') -join ',' }                 # split each part on Newline and join the elements with a comma

# write the headers to csv file, then the data
$headers | Set-Content -Path $fileOut
$data | Add-Content -Path $fileOut

P.S. Your example does not make it clear to me why you should look for the word INDEX as both data blocks have that..

Upvotes: 0

Toni
Toni

Reputation: 1826

this may help, but finally u need to define a header line:

$content = (gc C:\inFile.txt | ?{$_}) 
$list = [System.Collections.Generic.List[string]]::New()
$newContent = @(
    foreach ($line in $content){
        $list.add($line)
        If ($line -match '^index'){
            $list -join ','
            $list = [System.Collections.Generic.List[string]]::New()
        }
    }
)
$newContent | set-content .\myCsv.csv

Upvotes: 0

Related Questions