Reputation: 3484
Given a pipe delimited file with unknown number of rows, I need to convert this to multiple JSON documents - with each document representing an array of employees that belong to separate group.
Assumption is that all employees belonging to a specific department are listed together. There will be never a scenario where employees belonging to same department are not lumped together.
While the sample input provided below is small enough, the actual files received are quite large - upto 100MB for example.
What is the best way both memory and CPU wise to achieve this?
Sample Input:
Department|First Name|Last Name|Employee ID|Role
Accounting|Mark|Johnson|123|Manager
Accounting|John|Wayne|345|Sr. Accountant
Accounting|Marky|Mark|413|Jr. Accountant
HR|Susie|Johnson|542|Manager
HR|Lara|Wayne|4134|HR Rep
HR|Kira|Mark|642|Consultant
Sample Output:
accounting_employees.json
[
{
"firstName":"Mark",
"lastName":"Johnson",
"employeeId":"123",
"role":"Manager"
},
{
"firstName":"Marky",
"lastName":"Mark",
"employeeId":"413",
"role":"Jr. Accountant"
}
]
Upvotes: 0
Views: 60
Reputation: 1815
You can read the file and keep per department DataOutputStream
to json file.
When you open a file, add a [
and on closing the file after exit program, add a ]
For each row, find the department output stream and write the whole json to it.
Upvotes: 2