Reputation: 23
I am currently writing a script for the following problem:
Initial Problem
Data was exported from an Audit System into a CSV. The CSV itself consists of several columns of which one column has JSON data inside. Sadly there arent many options to influence the export / structure of the export. Since the amount of data included there is tough to filter and to analyse, the exported CSVs (when needed) need to be transformed so that only relevant columns and JSON keys are remaining within the new to-be-exported CSV. It has to be a new CSV as the file needs to potentially be shared. A textfile to-be-imported contains the relevant JSON keys that should remain in the to-be-exported CSV.
About the JSON: The keys can vary based on the events that are exported. Lets say there are 3-4 different variants but the textfile to be imported contains for all 3-4 keys the relevant subkeys that need to be included as new column in the export. If the subkey does not exist its okay if that particular column is empty in the export.
Initial Thoughts
a) Import the CSV and the file that is listing the relevant JSON keys that should be kept b) Expand the JSON c) Select the JSON entries that are relevant d) Merge everything into new columns e) Export again into a new file
What Questions are open / Where are the Problems?
I was writing some piece of code (I started my PS experience just 2 days ago) and encountered/wondering the following:
-join
and then using Out-File?
. I noticed that for my final test cases (I cannot share those because the data is extremely hard to anonymize) I didnt manage to come up with a delimiter (tried ",", ";" and "´t") that doesnt seem to be included in parts of the imported cells. Excel (when importing from text) doesnt seem to have an issue tho loading and parsing the data as CSV and to recognize the columns and boundaries correctly.Happy to hear any tips!
Current Code
### Configure variables
$inputPath = "C:\Users\test\Downloads\inputTest.csv"
$formatTemplate = "C:\Users\test\Downloads\templateTest.txt"
$outputPath = "C:\Users\test\Downloads\outputTest.csv"
### Load the columns from template file to perform transformation depending on the required AuditData Fields. The file contains a list of relevant JSON keys from the Audit Data columns
$selectedAuditDataFields = Get-Content $formatTemplate
### Load CSV, select needed columns and expand the JSON entries within the AuditData column
$importCsvCompact = Import-Csv -Path $inputPath -Delimiter "," | Select-Object -Property CreationDate, UserIds, Operations, @{name = "AuditData"; Expression = {$_.AuditData | ConvertFrom-Json }}
### Calculate the number of Rows (import and export CSV have same number of rows) and Columns (3 standard columns plus template columns) for the CSV to be exported
$exportCsvNumberOfRows = $importCsvCompact.Count
$exportCsvNumberOfColumns = $selectedAuditDataFields.Length + 3
### Add header to to-be-exported-CSV
$header = [object[]]::new($exportCsvNumberOfColumns);
$header[0] = "CreationDate"
$header[1] = "UserIds"
$header[2] = "Operations"
for($columnIncrement = 3; $columnIncrement -ne $exportCsvNumberOfColumns; $columnIncrement++) {
$header[$columnIncrement] = ($selectedAuditDataFields[$columnIncrement-3])
}
$toAppend = $header -join ","
$toAppend | Out-File -FilePath $outputPath -Append
### initiate array for each transformed row and initiate counter of current row
$processingRowCounter = 0
### traverse each row of the CSV import and setup the new column structure
### connect the 3 standard columns with a subset of the expanded JSON entries (based on the imported template)
$importCsvCompact | ForEach-Object {
$csvArrayColumn = [object[]]::new($exportCsvNumberOfColumns);
$csvArrayColumn[0] = $importCsvCompact.CreationDate[$processingRowCounter]
$csvArrayColumn[1] = $importCsvCompact.UserIds[$processingRowCounter]
$csvArrayColumn[2] = $importCsvCompact.Operations[$processingRowCounter]
for($columnIncrement = 3; $columnIncrement -ne $exportCsvNumberOfColumns; $columnIncrement++) {
$csvArrayColumn[$columnIncrement] = $importCsvCompact.AuditData.($selectedAuditDataFields[$columnIncrement-3])[$processingRowCounter]
}
$processingRowCounter++
$directExport = $csvArrayColumn -join ","
$directExport | Out-File -FilePath $outputPath -Append
Write-Host "Processed $processingRowCounter Rows..."
}
Testfiles
templateTest.txt https://easyupload.io/vx7k75
inputTest.csv https://easyupload.io/ab77q9
Current Version based on Feedback
### Configure variables
$inputPath = "C:\Users\forstchr\Downloads\inputTest.csv"
$formatTemplate = "C:\Users\forstchr\Downloads\templateTest.txt"
$outputPath = "C:\Users\forstchr\Downloads\outputTest.csv"
### Load the columns from template file to perform transformation depending on the required AuditData Fields. The file contains a list of relevant JSON keys from the Audit Data columns
$selectedAuditDataFields = Get-Content $formatTemplate
### Calculate the number of Rows (import and export CSV have same number of rows) and Columns (3 standard columns plus template columns) for the CSV to be exported
$exportCsvNumberOfRows = $importCsvCompact.Count
$exportCsvNumberOfAuditColumns = $selectedAuditDataFields.Length
###Load CSV, select needed columns and expand the JSON entries within the AuditData column
Import-csv -Path $inputPath -Delimiter "," | Select-Object -Property CreationDate, UserIds, Operations, @{name = "AuditData"; Expression = {$_.AuditData | ConvertFrom-Json }} | % {
[pscustomobject]@{
'CreationDate' = $_.CreationDate
'UserIds' = $_.UserIds
'Operations' = $_.Operations
# the next part is not correct but hopefully displays what I am trying to achieve with the JSON in the AuditData column
for($auditFieldIncrement = 0; $auditFieldIncrement -ne $exportCsvNumberOfAuditColumn; $auditFieldIncrement++) {
'$selectedAuditDataFields[$auditFieldIncrement]' = $_.AuditData.($selectedAuditDataFields[$auditFieldIncrement])
}
}
} | Export-csv $outputPath
Upvotes: 0
Views: 162
Reputation: 18940
I have had to produce a "cleansed" csv file in one project. My general approach was as follows: import the existing csv data, and send it through the pipeline. Foreach-object, do some processing, storing the results in variables. The last step in processing creates a hashtable typecast as a pscustomobject, and this result in passed through the pipeline. The output of the second pipeline is fed to Export-csv. Export-csv does all the joining and the commas for me, and also encloses the output fields in quotes, making them strings.
Here is a code snippet that illustrates the approach. The cleansing consists of reformatting dates so that they use a standard 14 digit format, and reformatting currency amounts so that they don't contain dollar signs. But that is not relevant to you.
Import-csv checking.csv | % {
$balance += [decimal]$(Get-Amount $_.AMOUNT)
[pscustomobject]@{
'TRNTYPE' = Get-Type $_.AMOUNT
'DTPOSTED' = (Get-Date $_.DATE).Tostring('yyyyMMddHHmmss')
'TRNAMT' = Get-Amount $_.AMOUNT
'FITID' = $fitid++ #this is a stopgap
'NAME' = $_.DESCRIPTION
'MEMO' = $memo
}
} |
Export-csv transactions.csv
Get-Type is a function that yields 'CREDIT' or 'DEBIT' depending on the sign of the amount. Get-Amount is a function that gives a numeric amount without commas and dollar signs. Those functions are defined at the beginning of the script. Note that, when you call a powershell function, there are no parentheses involved. That was a big jolt to me, but it's actually a feature of powershell.
Upvotes: 1