Meghashyam
Meghashyam

Reputation: 53

How to split the multiple json objects present in a json array in Apache NiFi?

Example Input is below: I need to split JSON objects present in a JSON array into individual JSON files using Apache NiFi and publish it to a Kafka Topic. There are multiple JSON objects present in the below array

[
{
    "stops": "1 Stop",
    "ticket price": "301.20",
    "days to departure": -1,
    "date of extraction": "03/22/2019",
    "departure": ", Halifax",
    "arrival": ", Toronto",
    "flight duration": "0 days 3 hours 58 minutes",
    "airline": "Porter Airlines",
    "plane": "DE HAVILLAND DHC-8 DASH 8-400 DASH 8Q",
    "timings": [
        {
            "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "6:40pm",
            "arrival_airport": "Ottawa, ON, Canada (YOW-Macdonald-Cartier Intl.)",
            "arrival_time": "7:58pm"
        },
        {
            "departure_airport": "Ottawa, ON, Canada (YOW-Macdonald-Cartier Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "8:30pm",
            "arrival_airport": "Toronto, ON, Canada (YTZ-Billy Bishop Toronto City)",
            "arrival_time": "9:38pm"
        }
    ],
    "plane code": "DH4",
    "id": "8e6c69c8-65e0-4f1b-b540-ae61abf8aa6d"
},
{
    "stops": "Nonstop",
    "ticket price": "390.95",
    "days to departure": -1,
    "date of extraction": "03/22/2019",
    "departure": ", Halifax",
    "arrival": ", Toronto",
    "flight duration": "0 days 2 hours 35 minutes",
    "airline": "Air Canada",
    "plane": "Boeing 767-300",
    "timings": [
        {
            "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "7:40pm",
            "arrival_airport": "Toronto, ON, Canada (YYZ-Pearson Intl.)",
            "arrival_time": "9:15pm"
        }
    ],
    "plane code": "763",
    "id": "fc13c5cb-93d1-46f9-b496-abbf6faba85a"
},
{
    "stops": "Nonstop",
    "ticket price": "391.33",
    "days to departure": -1,
    "date of extraction": "03/22/2019",
    "departure": ", Halifax",
    "arrival": ", Toronto",
    "flight duration": "0 days 2 hours 30 minutes",
    "airline": "WestJet",
    "plane": "BOEING 737-700 (WINGLETS) PASSENGER",
    "timings": [
        {
            "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "7:10pm",
            "arrival_airport": "Toronto, ON, Canada (YYZ-Pearson Intl.)",
            "arrival_time": "8:40pm"
        }
    ],
    "plane code": "73W",
    "id": "4d49c24b-6fb0-4f45-ba05-a3969ce7308a"
}
]

Needed Output: Individual JSON objects like below. I would like to post each JSON object to a Kafka topic.

{
        "stops": "Nonstop",
        "ticket price": "390.95",
        "days to departure": -1,
        "date of extraction": "03/22/2019",
        "departure": ", Halifax",
        "arrival": ", Toronto",
        "flight duration": "0 days 2 hours 35 minutes",
        "airline": "Air Canada",
        "plane": "Boeing 767-300",
        "timings": [
            {
                "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
                "departure_date": "03/22/2019",
                "departure_time": "7:40pm",
                "arrival_airport": "Toronto, ON, Canada (YYZ-Pearson Intl.)",
                "arrival_time": "9:15pm"
            }
        ],
        "plane code": "763",
        "id": "fc13c5cb-93d1-46f9-b496-abbf6faba85a"
    }

Upvotes: 0

Views: 5676

Answers (2)

Arjun Arora
Arjun Arora

Reputation: 1006

This is an old post, but still wants to add my suggestions. Firstly, @OneCricketeer is correct that you have to use SplitJson processor for the same, but expression is very important in that.

As per the json provided by @Meghashaym, i would suggest to wrap the array into one object like below:

{"payload":[
{
    "stops": "1 Stop",
    "ticket price": "301.20",
    "days to departure": -1,
    "date of extraction": "03/22/2019",
    "departure": ", Halifax",
    "arrival": ", Toronto",
    "flight duration": "0 days 3 hours 58 minutes",
    "airline": "Porter Airlines",
    "plane": "DE HAVILLAND DHC-8 DASH 8-400 DASH 8Q",
    "timings": [
        {
            "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "6:40pm",
            "arrival_airport": "Ottawa, ON, Canada (YOW-Macdonald-Cartier Intl.)",
            "arrival_time": "7:58pm"
        },
        {
            "departure_airport": "Ottawa, ON, Canada (YOW-Macdonald-Cartier Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "8:30pm",
            "arrival_airport": "Toronto, ON, Canada (YTZ-Billy Bishop Toronto City)",
            "arrival_time": "9:38pm"
        }
    ],
    "plane code": "DH4",
    "id": "8e6c69c8-65e0-4f1b-b540-ae61abf8aa6d"
},
{
    "stops": "Nonstop",
    "ticket price": "390.95",
    "days to departure": -1,
    "date of extraction": "03/22/2019",
    "departure": ", Halifax",
    "arrival": ", Toronto",
    "flight duration": "0 days 2 hours 35 minutes",
    "airline": "Air Canada",
    "plane": "Boeing 767-300",
    "timings": [
        {
            "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "7:40pm",
            "arrival_airport": "Toronto, ON, Canada (YYZ-Pearson Intl.)",
            "arrival_time": "9:15pm"
        }
    ],
    "plane code": "763",
    "id": "fc13c5cb-93d1-46f9-b496-abbf6faba85a"
},
{
    "stops": "Nonstop",
    "ticket price": "391.33",
    "days to departure": -1,
    "date of extraction": "03/22/2019",
    "departure": ", Halifax",
    "arrival": ", Toronto",
    "flight duration": "0 days 2 hours 30 minutes",
    "airline": "WestJet",
    "plane": "BOEING 737-700 (WINGLETS) PASSENGER",
    "timings": [
        {
            "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "7:10pm",
            "arrival_airport": "Toronto, ON, Canada (YYZ-Pearson Intl.)",
            "arrival_time": "8:40pm"
        }
    ],
    "plane code": "73W",
    "id": "4d49c24b-6fb0-4f45-ba05-a3969ce7308a"
}
]}

Now i am using the Jsonpath finder to view the json structure. When we click on Payload object, we can see the array items in path x.payload

In this case, You can use $.payload[*] as the expression in the processor and Set the Primary Node For Execution option under scheduling tab. enter image description here This should queue up the individual items in the queue list. So basically we are parsing each element of the array object.

Upvotes: 0

OneCricketeer
OneCricketeer

Reputation: 192013

You can use SplitJson processor, this processor will split json array of messages into individual messages as content of each flowfile i.e if your json array having 100 messages in it then split json processor splits relation will output 100 flowfiles having each message in it

JSONPath is $.*

https://community.hortonworks.com/questions/183055/need-to-display-each-element-of-array-in-a-separat.html

Upvotes: 2

Related Questions