Improving flow in Apache NiFI

Question

I'm trying to simplify flow in Apache NiFi.

What I want:

Call Facebook Graph API to receive campaigns for ad accounts and save it to DB. Response example:

[ {
  "start_date" : "2018-10-15",
  "stop_date" : "2019-03-31",
  "id" : "608962192",
  "account_id" : "1007311",
  "name" : "Axe_Instagram_aug-dec2018_col",
  "status" : "ACTIVE",
  "start_time" : "2018-10-15",
  "stop_time" : "2019-03-31"
}, {
  "start_date" : "2018-10-08",
  "stop_date" : "2018-10-31",
  "id" : "61084542",
  "account_id" : "10240051",
  "name" : "Axe_IG_aug-dec2018",
  "status" : "ACTIVE",
  "start_time" : "2018-10-08",
  "stop_time" : "2018-10-31"
} ]

Call Facebook Graph API to receive ads for ad accounts and save it to DB. Response example:

[
   {
      "id":"23845",
      "account_id":"251977841",
      "name":"Post_2",
      "status":"ACTIVE",
      "campaign_id":"2384345125",
      "adset_id":"238125",
      "bid_amount":87,
      "updated_time":"2019-06-20T14:21:06+0300"
   },
   {
      "id":"23843453786320125",
      "account_id":"2251971478158841",
      "name":"Post_1",
      "status":"ACTIVE",
      "campaign_id":"238225",
      "adset_id":"2384325",
      "bid_amount":87,
      "updated_time":"2019-06-20T14:21:06+0300"
   }
]

Filter ads:
- I should leave only active campaigns (from campaigns) using these rules: stop_date should be empty (NULL) OR stop_date should be > '2021-01-01'
- Check if campaign_id from ads contains in result set above.

My current behaviour is:

Completed 2 steps above; all data stored in DB.
For each flow file from ads API I'm using next flow:
SplitJson to separate ad one by one;
EvaluateJsonPath to store campaign_id to attributes;
ExecuteSQL with next statement for each flow file:

select *
from facebook_api.campaigns c 
where c.id = '${campaign.id}'
and (c.stop_date is null or c.stop_date > '2021-01-01')

This will return nothing or active (with my criteria) campaign. After that I can filter them with RouteOnAttribute: ${executesql.rows.count:lt(1)}.

But there is a problem. Splitting source 300 flowfile creates about 100,000 flowfiles and I'll make a 100,000 unnecessary requests to db.

Can I perform requests with same logic without splitting flow files?

Improving flow in Apache NiFI

Answers (1)

Related Questions