Yuukiii
Yuukiii

Reputation: 23

Transform exported CSV (includes embedded JSON) and save relevant columns and keys in new CSV file - Powershell

I am currently writing a script for the following problem:

Initial Problem

Data was exported from an Audit System into a CSV. The CSV itself consists of several columns of which one column has JSON data inside. Sadly there arent many options to influence the export / structure of the export. Since the amount of data included there is tough to filter and to analyse, the exported CSVs (when needed) need to be transformed so that only relevant columns and JSON keys are remaining within the new to-be-exported CSV. It has to be a new CSV as the file needs to potentially be shared. A textfile to-be-imported contains the relevant JSON keys that should remain in the to-be-exported CSV.

About the JSON: The keys can vary based on the events that are exported. Lets say there are 3-4 different variants but the textfile to be imported contains for all 3-4 keys the relevant subkeys that need to be included as new column in the export. If the subkey does not exist its okay if that particular column is empty in the export.

Initial Thoughts

a) Import the CSV and the file that is listing the relevant JSON keys that should be kept b) Expand the JSON c) Select the JSON entries that are relevant d) Merge everything into new columns e) Export again into a new file

What Questions are open / Where are the Problems?

I was writing some piece of code (I started my PS experience just 2 days ago) and encountered/wondering the following:

  1. Are there any recommendations to improve the code since I am sure to do my very recent PS adventure there are probably many obvious things that have to be improved
  2. Is there a way to make the export straight into a CSV format without the manual -join and then using Out-File?. I noticed that for my final test cases (I cannot share those because the data is extremely hard to anonymize) I didnt manage to come up with a delimiter (tried ",", ";" and "´t") that doesnt seem to be included in parts of the imported cells. Excel (when importing from text) doesnt seem to have an issue tho loading and parsing the data as CSV and to recognize the columns and boundaries correctly.

Happy to hear any tips!

Current Code

### Configure variables

$inputPath = "C:\Users\test\Downloads\inputTest.csv"
$formatTemplate = "C:\Users\test\Downloads\templateTest.txt"
$outputPath = "C:\Users\test\Downloads\outputTest.csv"

### Load the columns from template file to perform transformation depending on the required AuditData Fields. The file contains a list of relevant JSON keys from the Audit Data columns

$selectedAuditDataFields = Get-Content $formatTemplate

### Load CSV, select needed columns and expand the JSON entries within the AuditData column

$importCsvCompact = Import-Csv -Path $inputPath -Delimiter "," | Select-Object -Property CreationDate, UserIds, Operations, @{name = "AuditData"; Expression = {$_.AuditData | ConvertFrom-Json }}

### Calculate the number of Rows (import and export CSV have same number of rows) and Columns (3 standard columns plus template columns) for the CSV to be exported

$exportCsvNumberOfRows = $importCsvCompact.Count
$exportCsvNumberOfColumns = $selectedAuditDataFields.Length + 3

### Add header to to-be-exported-CSV

$header = [object[]]::new($exportCsvNumberOfColumns);

$header[0] = "CreationDate"
$header[1] = "UserIds"
$header[2] = "Operations"

for($columnIncrement = 3; $columnIncrement -ne $exportCsvNumberOfColumns; $columnIncrement++) {
        $header[$columnIncrement] = ($selectedAuditDataFields[$columnIncrement-3])
}

$toAppend = $header -join ","
$toAppend | Out-File -FilePath $outputPath -Append

### initiate array for each transformed row and initiate counter of current row

$processingRowCounter = 0

### traverse each row of the CSV import and setup the new column structure
### connect the 3 standard columns with a subset of the expanded JSON entries (based on the imported template)

$importCsvCompact | ForEach-Object {

    $csvArrayColumn = [object[]]::new($exportCsvNumberOfColumns);

    $csvArrayColumn[0] = $importCsvCompact.CreationDate[$processingRowCounter]
    $csvArrayColumn[1] = $importCsvCompact.UserIds[$processingRowCounter]
    $csvArrayColumn[2] = $importCsvCompact.Operations[$processingRowCounter]

    for($columnIncrement = 3; $columnIncrement -ne $exportCsvNumberOfColumns; $columnIncrement++) {
        $csvArrayColumn[$columnIncrement] = $importCsvCompact.AuditData.($selectedAuditDataFields[$columnIncrement-3])[$processingRowCounter]
    }

    $processingRowCounter++

    $directExport = $csvArrayColumn -join ","
    $directExport | Out-File -FilePath $outputPath -Append
    
    Write-Host "Processed $processingRowCounter Rows..."
}

Testfiles

templateTest.txt https://easyupload.io/vx7k75

inputTest.csv https://easyupload.io/ab77q9

Current Version based on Feedback

### Configure variables

$inputPath = "C:\Users\forstchr\Downloads\inputTest.csv"
$formatTemplate = "C:\Users\forstchr\Downloads\templateTest.txt"
$outputPath = "C:\Users\forstchr\Downloads\outputTest.csv"



### Load the columns from template file to perform transformation depending on the required AuditData Fields. The file contains a list of relevant JSON keys from the Audit Data columns

$selectedAuditDataFields = Get-Content $formatTemplate



### Calculate the number of Rows (import and export CSV have same number of rows) and Columns (3 standard columns plus template columns) for the CSV to be exported

$exportCsvNumberOfRows = $importCsvCompact.Count
$exportCsvNumberOfAuditColumns = $selectedAuditDataFields.Length

###Load CSV, select needed columns and expand the JSON entries within the AuditData column

Import-csv -Path $inputPath -Delimiter "," | Select-Object -Property CreationDate, UserIds, Operations, @{name = "AuditData"; Expression = {$_.AuditData | ConvertFrom-Json }}  | % {

    [pscustomobject]@{
          'CreationDate' = $_.CreationDate   
          'UserIds' = $_.UserIds 
          'Operations' = $_.Operations    

# the next part is not correct but hopefully displays what I am trying to achieve with the JSON in the AuditData column
          
          for($auditFieldIncrement = 0; $auditFieldIncrement -ne $exportCsvNumberOfAuditColumn; $auditFieldIncrement++) {
                '$selectedAuditDataFields[$auditFieldIncrement]' = $_.AuditData.($selectedAuditDataFields[$auditFieldIncrement])
          }
         
    }
} | Export-csv $outputPath

Upvotes: 0

Views: 162

Answers (1)

Walter Mitty
Walter Mitty

Reputation: 18940

I have had to produce a "cleansed" csv file in one project. My general approach was as follows: import the existing csv data, and send it through the pipeline. Foreach-object, do some processing, storing the results in variables. The last step in processing creates a hashtable typecast as a pscustomobject, and this result in passed through the pipeline. The output of the second pipeline is fed to Export-csv. Export-csv does all the joining and the commas for me, and also encloses the output fields in quotes, making them strings.

Here is a code snippet that illustrates the approach. The cleansing consists of reformatting dates so that they use a standard 14 digit format, and reformatting currency amounts so that they don't contain dollar signs. But that is not relevant to you.

Import-csv checking.csv  | % {

    $balance +=  [decimal]$(Get-Amount $_.AMOUNT)

    [pscustomobject]@{
          'TRNTYPE' = Get-Type $_.AMOUNT        
          'DTPOSTED' = (Get-Date $_.DATE).Tostring('yyyyMMddHHmmss')
          'TRNAMT' = Get-Amount $_.AMOUNT
          'FITID' = $fitid++        #this is a stopgap
          'NAME' = $_.DESCRIPTION
          'MEMO' = $memo

          }
    } |
    Export-csv transactions.csv

Get-Type is a function that yields 'CREDIT' or 'DEBIT' depending on the sign of the amount. Get-Amount is a function that gives a numeric amount without commas and dollar signs. Those functions are defined at the beginning of the script. Note that, when you call a powershell function, there are no parentheses involved. That was a big jolt to me, but it's actually a feature of powershell.

Upvotes: 1

Related Questions