Reputation: 1079
I have complex json response after InvokeHTTP which needs to be converted to multiple csv files using NiFi. For example below data needs to be converted to three csv files. First csv file will have ID, Owner, address. Second csv will have ID, contact, firstName. Third will have ID, workList, owner.
"data" : [ {
"ID":"789379832",
"CreatedDate" : "2021-11-13T14:25:17-05:00",
"Owner" : "abc",
"workList" : "Sing, Play",
"address" : "",
"Phone" : "",
"contact" : [ "Phone","Email" ],
"email" : "",
"firstName" : "Kim",
"notes" : "Hello abc",
"yearCreated" : 2021
}, {
"ID":"999079832",
"CreatedDate" : "2021-04-13T09:36:51-04:00",
"Owner" : "xyz",
"workList" : "Sing",
"address" : "80 Hastings Street",
"Phone" : "",
"contact" : [ "Phone","Self" ],
"email" : "",
"firstName" : "Hari",
"notes" : "Hello Hari",
"yearCreated" : 2021
}
]
Example csv1
ID,Owner,address
789379832,abc,
999079832,xyz,80 Hastings Street
I am using "EvaluateJsonPath" to get key "data" using $.data. Then to make the json array to be fileflow content using "ReplaceText" and then "SplitJson" array. Then I am using "ConvertRecord" processor with 'JsonTreeReader' and 'CsvRecordSetWriter'. I want to understand how to set attribute values for each csv file in ConvertRecord here. Is my approach not optimal should i use JoltTransformJSON instead.
Upvotes: 0
Views: 855
Reputation: 2032
Edit:
Ok so you want to make 3 CSV files, where each Array item is a row in the CSV.
Very similar to my original suggestion, but using EvaluateJsonPath instead of ForkRecord.
You don't need to be using FlowFile attributes for any of this flow.
Set each CSVRecordSetWriter with a schema with only the fields you want, e.g.
{
"type": "record",
"name": "CSV",
"fields" : [
{"name": "ID", "type": "string"},
{"name": "Owner", "type": "string"},
{"name": "address", "type": "string"}
]
}
So that it ignores all of the fields you don't want.
Tool for validating avro schemas
Example Avro Schema with an Array (for the contacts field)
{
"type": "record",
"name": "CSV",
"fields" : [
{"name": "ID", "type": "string"},
{"name": "contact", "type":{"type": "array", "items" : "string" }},
{"name": "firstName", "type": "string"}
]
}
Original:
So firstly, when you say this For example below data needs to be converted to three csv files
- do you mean that each Object in the Array needs to create 3 CSV files? I.e. the example data would produce 3 CSVs for Array item 1, and 3 CSVs for Array item 2?
In this case, I think ForkRecord is a better fit for you, as you can achive most of what you're doing in a single processor - you can process the elements of the array as individual records (see the Additional Details page for examples). With a JSON Reader and a CSV Writer you've already handled the conversion and splits.
After that, you can put down 3 ConvertRecords and have the ForkRecord success relationship into all 3 (each one gets a copy of the records, for your 3 CSVs). Each UpdateRecord has a different CSV Writer with a reduced schema to drop the fields you don't want for that CSV. Then have the UpdateRecords success relationship into PutFile.
Upvotes: 1