RKodakandla
RKodakandla

Reputation: 3484

Memory efficient way to read and parse a flat file

Given a pipe delimited file with unknown number of rows, I need to convert this to multiple JSON documents - with each document representing an array of employees that belong to separate group.

Assumption is that all employees belonging to a specific department are listed together. There will be never a scenario where employees belonging to same department are not lumped together.

While the sample input provided below is small enough, the actual files received are quite large - upto 100MB for example.

What is the best way both memory and CPU wise to achieve this?

Sample Input:

Department|First Name|Last Name|Employee ID|Role
Accounting|Mark|Johnson|123|Manager
Accounting|John|Wayne|345|Sr. Accountant
Accounting|Marky|Mark|413|Jr. Accountant
HR|Susie|Johnson|542|Manager
HR|Lara|Wayne|4134|HR Rep
HR|Kira|Mark|642|Consultant

Sample Output:

accounting_employees.json

[
  {
    "firstName":"Mark",
    "lastName":"Johnson",
    "employeeId":"123",
    "role":"Manager"
  },
  {
    "firstName":"Marky",
    "lastName":"Mark",
    "employeeId":"413",
    "role":"Jr. Accountant"
  }

]

Upvotes: 0

Views: 60

Answers (1)

Dragonborn
Dragonborn

Reputation: 1815

You can read the file and keep per department DataOutputStream to json file. When you open a file, add a [ and on closing the file after exit program, add a ]

For each row, find the department output stream and write the whole json to it.

Upvotes: 2

Related Questions