funnelCONN
funnelCONN

Reputation: 149

Extract and modify text from JSON (Apache Ni-Fi)

I'm using InvokeHttp to get json as response from REST API. My json looks like:

{
    "resourceNames": [
        "customers/123",
        "customers/12345",
        "customers/555",
        "customers/9890"
    ]
}

How can i extract from this json 4 strings without customers prefix?

123
12345
555
9890

Upvotes: 0

Views: 226

Answers (4)

mattyb
mattyb

Reputation: 12083

You can use JoltTransformJSON -> PutDatabaseRecord using the following JOLT spec:

[
  {
    "operation": "shift",
    "spec": {
      "resourceNames": {
        "*": {
          "customers/*": {
            "$(0,1)": "[].id"
          }
        }
      }
    }
  }
]

That will create the following output:

[ {
  "id" : "123"
}, {
  "id" : "12345"
}, {
  "id" : "555"
}, {
  "id" : "9890"
} ]

which you can then send to PutDatabaseRecord and it will insert a row for each JSON object in your flow file (assuming you're using a JsonTreeReader in PutDatabaseRecord) with the given value into the id column.

Upvotes: 1

Joxebus
Joxebus

Reputation: 205

You can do this with Groovy, if you already have the original JSON you can do something like this:

def output = input['resourceNames'].collect{ name ->
  name - 'customers/'
}
println JsonOutput.toJson(output)

This will produce the following output:

["123","12345","555","9890"]

And here is the full sample code that I created for this:

import groovy.json.JsonSlurper
import groovy.json.JsonOutput

String json = '''{
    "resourceNames": [
        "customers/123",
        "customers/12345",
        "customers/555",
        "customers/9890"
    ]
}'''

Map input = new JsonSlurper().parseText(json)

def output = input['resourceNames'].collect{ name ->
  name - 'customers/'
}

println JsonOutput.toJson(output)

Upvotes: 1

Up_One
Up_One

Reputation: 5271

I guess if you want to use pure NiFi processors you would use a ReplaceText to clean the "customers/" enter image description here

Wil give you this :

{
    "resourceNames": [
        "123",
        "12345",
        "555",
        "9890"
    ]
}

And you can use EvaluateJson to parse it with a valX = $.resourceNames.[*]

But if you want to promote this to a database insert, you would do a split on the incoming flow instead of putting it in a attribute.

You could use splitjson insdted of EvaluateJson - with split on $.resourceNames then ExtractText with an added attribute lets say datapoint = (?s)(^.*$) value which is then sent to PutSQL where you SQL Statement looks like insert into mytbl values (${datapoint}).

Again - you could use a PutRecord proc for a more performant setup , but that will require setting other stuff and is a bit more complex, if you have large numbers of flows maybe PutRecord is the route, the Split, Evaluate, extract processors endup beeing very slow and they consume alot of ram.

Is quite heavy - due to the nature of ReplaceText, and also if your string changes you can pass it as a parameter.

One good approach would be a JoltTransformJSON processor, is quite tricky to get the Jolt Specification right, but it will much faster than ReplaceText.

Upvotes: 2

Mike Thomsen
Mike Thomsen

Reputation: 37506

The fastest way would be with ExecuteScript and a little Groovy script that would use Groovy's JsonSlurper to parse the JSON, extract the values and then rewrite the flowfile's content.

Upvotes: 1

Related Questions