Joel Gooch
Joel Gooch

Reputation: 11

Merging Attributes in Apache Nifi after a ExtractText (using Regex)

After using the Nifi ExtractText processor to extract matches from the flowfile-content using regex (using multiple capturing mode), you are supplied with a series of numerically ascending attributes. E.G. date, date.0, date.1, date.2, ... representing multiple captures throughout the text.

What I want is a single attribute ${dates}, that contains each of the entries captured. Can anybody help?

(Nifi v1.5.0)

Upvotes: 1

Views: 5372

Answers (3)

Mahendran V M
Mahendran V M

Reputation: 3496

You can get those values by using getDelimitedField expression language.

If you want capture your flowfile content, use ExtractText processor in that itself create new attribute with following regex.

dates:(.*)

if you have content like below

1,Hi,23,001

It going to be store in attribute named dates and you can use it in flow by ${dates}

In that it self if you want get those values use below expression.

Use update attribute, to capture those values like below.

ID :${dates:getDelimitedField(1)}
Name:${dates:getDelimitedField(2)}
Age:${dates:getDelimitedField(3)}

You may use that in flow as usual like${ID},${Name},${Age}

Hope this really helpful for you.

Upvotes: 0

notNull
notNull

Reputation: 31520

Use update attribute processor add new property with expression language

dates:

${allMatchingAttributes("date.*"):join(",")}

this expression which join all date,date.0,date.1,date.2 and add dates attribute to the flowfile.

For more reference:-

https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#join

Upvotes: 2

Andy
Andy

Reputation: 14194

The date.0 attribute is what you are likely interested in.

For example, I used a GenerateFlowFile processor to create flowfiles which contain the text This is a message generated at ${now():format('yyyy/MM/dd HH:mm:ss.SSS Z')} which will result in content like This is a message generated at 2018/02/21 09:25:16.832 -0800.

I then used an ExtractText processor with a new attribute called date with the following regex: (\d{4}/\d{2}/\d{2}) (\d{2}:\d{2}:\d{2}\.\d{3}) .?\d{4} You can see this has two capture groups -- one for the year/month/day portion and one for the hour/minute/second/millis portion.

After running the ExtractText, this is the result. You can see that the individual capture groups are in date.1 and date.2, but date.0 contains the entire regular expression match.

--------------------------------------------------
Standard FlowFile Attributes
Key: 'entryDate'
    Value: 'Wed Feb 21 09:25:16 PST 2018'
Key: 'lineageStartDate'
    Value: 'Wed Feb 21 09:25:16 PST 2018'
Key: 'fileSize'
    Value: '62'
FlowFile Attribute Map Content
Key: 'date'
    Value: '2018/02/21'
Key: 'date.0'
    Value: '2018/02/21 09:25:16.832 -0800'
Key: 'date.1'
    Value: '2018/02/21'
Key: 'date.2'
    Value: '09:25:16.832'
Key: 'filename'
    Value: '813454866687188'
Key: 'firstName'
    Value: 'Andy'
Key: 'fullName'
    Value: ' '
Key: 'lastName'
    Value: 'LoPresto'
Key: 'path'
    Value: './'
Key: 'uuid'
    Value: '9e5de17c-2d62-401e-ad13-d49adf5fdd85'
--------------------------------------------------
This is a message generated at 2018/02/21 09:25:16.832 -0800.

Upvotes: 0

Related Questions