Extract columns from pipe delimited text file and output using powershell

Question

I need to extract 2 columns (column 147 and column 148) from a huge pipe delimited file (100 MB-3GB). I cannot open in excel or if I do it takes forever. Once the columns are extracted, I need to output the top N rows into another text file but I need to format them so they can be put into SQL so I can query a table. However, there can be many concatenated fields, separated by commas, in the 2 extracted columns (LOGICAL_KEY_CONCAT_FIELD_CLM, LOGICAL_KEY_CONCAT_FIELD_CLM2) and their format can vary from file to file. I need this to be a repeating process.

For example, pipe delimited file looks like:

|||…|LOGICAL_KEY_CONCAT_FIELD_CLM|LOGICAL_KEY_CONCAT_FIELD_CLM2|||||…
|||…|CLM,KEY,1|CLM,FINANCIAL,KEY,1,2018-11-30|||…
|||…|CLM,KEY,2|CLM,FINANCIAL,KEY,2,2018-11-30|||…
|||…|CLM,KEY,3|CLM,FINANCIAL,KEY,3,2018-11-30|||…
.
.
.

OUTPUT:

(LOGICAL_KEY_CONCAT_FIELD_CLM = 'CLM,KEY,1' AND 
LOGICAL_KEY_CONCAT_FIELD_CLM2 = 'CLM,FINANCIAL,KEY,1,2018-11-30')
OR (LOGICAL_KEY_CONCAT_FIELD_CLM = 'CLM,KEY,2' AND 
LOGICAL_KEY_CONCAT_FIELD_CLM2 = 'CLM,FINANCIAL,KEY,2,2018-11-30')
OR (LOGICAL_KEY_CONCAT_FIELD_CLM = 'CLM,KEY,3' AND 
LOGICAL_KEY_CONCAT_FIELD_CLM2 = 'CLM,FINANCIAL,KEY,3,2018-11-30')

Basically, I want to run a query using the data from the columns in the file and I want to ensure the combination of LOGICAL_KEY_CONCAT_FIELD_CLM and LOGICAL_KEY_CONCAT_FIELD_CLM2 are in the table. I don’t need SELECT * FROM Table WHERE because the Table varies depending on the incoming source.

I am using the following code but it doesn’t print the columns in the output file, just the headers:

Get-Content "\LocationOfFile\CLAIM_20190103T17053920.txt" | select-object 
LOGICAL_KEY_CONCAT_FIELD_CLM,LOGICAL_KEY_CONCAT_FIELD_CLM2 -First 10 | Out- 
File "P:\PDS_QA\TestFile\Output.txt"

marsze · Accepted Answer

Alternative solution, with pure built-in PowerShell cmdlets. I can't say how good the performance is for large files. Probably worse than my other answer. You should try it out.

Also, this requires the input CSV to have a header with unique names for every column.

$infile = "\LocationOfFile\CLAIM_20190103T17053920.txt"
$outfile = "P:\PDS_QA\TestFile\Output.txt"
"SELECT * FROM Table WHERE 1=0" | Out-File $outfile
Import-Csv $infile -Delimiter "|" | foreach {
    "OR (LOGICAL_KEY_CONCAT_FIELD_CLM = '{0}' AND LOGICAL_KEY_CONCAT_FIELD_CLM2 = '{1}')" -f (
    $_.LOGICAL_KEY_CONCAT_FIELD_CLM,
    $_.LOGICAL_KEY_CONCAT_FIELD_CLM2)
} | Out-File $outfile -Append

Extract columns from pipe delimited text file and output using powershell

Answers (2)

Related Questions