likeGreen
likeGreen

Reputation: 1079

convert json array to csv files in Nifi

I have complex json response after InvokeHTTP which needs to be converted to multiple csv files using NiFi. For example below data needs to be converted to three csv files. First csv file will have ID, Owner, address. Second csv will have ID, contact, firstName. Third will have ID, workList, owner.

 "data" : [ {
    "ID":"789379832",
    "CreatedDate" : "2021-11-13T14:25:17-05:00",
    "Owner" : "abc",
    "workList" : "Sing, Play",
    "address" : "",
    "Phone" : "",
    "contact" : [ "Phone","Email" ],
    "email" : "",
    "firstName" : "Kim",
    "notes" : "Hello abc",
    "yearCreated" : 2021
  }, {
    "ID":"999079832",
    "CreatedDate" : "2021-04-13T09:36:51-04:00",
     "Owner" : "xyz",
    "workList" : "Sing",
    "address" : "80 Hastings Street",
    "Phone" : "",
    "contact" : [ "Phone","Self" ],
    "email" : "",
    "firstName" : "Hari",
    "notes" : "Hello Hari",
    "yearCreated" : 2021
  }
  ]

Example csv1

ID,Owner,address
789379832,abc,
999079832,xyz,80 Hastings Street

I am using "EvaluateJsonPath" to get key "data" using $.data. Then to make the json array to be fileflow content using "ReplaceText" and then "SplitJson" array. Then I am using "ConvertRecord" processor with 'JsonTreeReader' and 'CsvRecordSetWriter'. I want to understand how to set attribute values for each csv file in ConvertRecord here. Is my approach not optimal should i use JoltTransformJSON instead.

Upvotes: 0

Views: 855

Answers (1)

Sdairs
Sdairs

Reputation: 2032

Edit:

Ok so you want to make 3 CSV files, where each Array item is a row in the CSV.

Very similar to my original suggestion, but using EvaluateJsonPath instead of ForkRecord.

You don't need to be using FlowFile attributes for any of this flow.

  1. EvaluateJsonPath with Destination property set to flowfile-content. This avoids the need for ReplaceText.
  2. 3 ConvertRecords and have the EvaluateJsonPath success relationship into all 3 (each one gets a copy of the records, for your 3 CSVs). Each UpdateRecord has a different CSV Writer with a reduced schema to drop the fields you don't want for that CSV. Then have the UpdateRecords success relationship into PutFile.

Set each CSVRecordSetWriter with a schema with only the fields you want, e.g.

{
  "type": "record",
  "name": "CSV",
  "fields" : [
    {"name": "ID", "type": "string"},
    {"name": "Owner", "type": "string"},
    {"name": "address", "type": "string"}
  ]
}

So that it ignores all of the fields you don't want.

Docs for avro schemas

Tool for validating avro schemas

Example Avro Schema with an Array (for the contacts field)

{
  "type": "record",
  "name": "CSV",
  "fields" : [
    {"name": "ID", "type": "string"},
    {"name": "contact", "type":{"type": "array", "items" : "string" }},
    {"name": "firstName", "type": "string"}
  ]
}

Original:

So firstly, when you say this For example below data needs to be converted to three csv files - do you mean that each Object in the Array needs to create 3 CSV files? I.e. the example data would produce 3 CSVs for Array item 1, and 3 CSVs for Array item 2?

In this case, I think ForkRecord is a better fit for you, as you can achive most of what you're doing in a single processor - you can process the elements of the array as individual records (see the Additional Details page for examples). With a JSON Reader and a CSV Writer you've already handled the conversion and splits.

After that, you can put down 3 ConvertRecords and have the ForkRecord success relationship into all 3 (each one gets a copy of the records, for your 3 CSVs). Each UpdateRecord has a different CSV Writer with a reduced schema to drop the fields you don't want for that CSV. Then have the UpdateRecords success relationship into PutFile.

Upvotes: 1

Related Questions