Reputation: 95
I am trying to split 1 NDJSON file into multiple NDJSON files. I am able to consume and split the file, but the problem is the resulting files are in JSON format. Is it possible to output to NDJSON format or do I have to do some string manipulation?
My input file test.json:
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38116"}
My powershell script so far:
$json = (Get-Content C:\Temp\test.json) | ConvertFrom-Json
$jnl_list = $json.JRNAL_NO | select -Unique
ForEach ($jnl in $jnl_list) {
$Array = $json | Where-Object {$_.JRNAL_NO -eq $jnl}
$res = ($Array | ConvertTo-Json)
$res | Out-File -FilePath .\JNL\$($jnl).json
}
My current output. Here's the 38115.json file:
[
{
"PERIOD": "2024004",
"JRNAL_NO": "38115"
},
{
"PERIOD": "2024004",
"JRNAL_NO": "38115"
},
{
"PERIOD": "2024004",
"JRNAL_NO": "38115"
},
{
"PERIOD": "2024004",
"JRNAL_NO": "38115"
}
]
I need the output file to be NDJSON, basically the same format as the input file. 38115.json should be:
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
38116.json should be:
{"PERIOD":"2024004","JRNAL_NO":"38116"}
Upvotes: 3
Views: 117
Reputation: 23663
To complement the helpful answer from Santiago Squarzon with a presumably more performant solution:
You could also consider to distribute the original lines ("same format as the input file") one-by-one on the fly using the steppable pipeline as in this example where you only use the ConvertTo-Json
result to determine the "JRNAL_NO
" value:
$Pipeline = @{}
Get-Content .\test.json |
ForEach-Object -Process {
$Jnl = ($_ | ConvertFrom-Json).JRNAL_NO
if (!$Pipeline.Contains($Jnl)) {
$Pipeline[$Jnl] = { Set-Content .\$Jnl.json }.GetSteppablePipeline()
$Pipeline[$Jnl].Begin($True)
}
$Pipeline[$Jnl].Process($_)
} -End {
foreach ($Key in $Pipeline.Keys) { $Pipeline[$Key].End() }
}
For more details and background, see: Mastering the (steppable) pipeline
Upvotes: 1
Reputation: 60045
If I understand correctly what you're looking for, the code my be greatly simplified by using Group-Object
.
To get the NDJSON format what you can do is enumerate each object from the group of objects (.Group
property) and pass it to ConvertTo-Json -Compress
then send that output to your file.
$json = Get-Content C:\Temp\test.json | ConvertFrom-Json
foreach ($group in $json | Group-Object JRNAL_NO) {
$group.Group |
ForEach-Object { $_ | ConvertTo-Json -Compress } |
Set-Content ".\JNL\$($group.Name).json"
}
Code above with the sample data would be creating 2 files:
38115.json
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
{"PERIOD":"2024004","JRNAL_NO":"38115"}
38116.json
{"PERIOD":"2024004","JRNAL_NO":"38116"}
Upvotes: 4