Filippo Loddo
Filippo Loddo

Reputation: 1096

NiFi: Remove fixed number of header lines from file

I'm processing a file and I'd like to remove (trim) the first X header lines to keep only data, possibly avoiding using regular expressions.

Thanks

Upvotes: 7

Views: 5372

Answers (1)

Biplob Biswas
Biplob Biswas

Reputation: 1881

You can remove the first X header lines by using ExecuteScript procesor in Nifi.

The following is a example Jython script which I wrote for myself:

import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
  def __init__(self):
        pass
  def process(self, inputStream, outputStream):
    text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
    for line in text[3:]:
        outputStream.write(line + "\n") 

flowFile = session.get()
if (flowFile != None):
  flowFile = session.write(flowFile,PyStreamCallback())
  flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename').split('.')[0]+'_translated.json')
  session.transfer(flowFile, REL_SUCCESS)

This obviously removes the first 3 lines but you can easily modify it to remove more or less lines.

Hope that helps.

Upvotes: 8

Related Questions