Reputation:
I have a csv file with string fields containing digits separated by whitespaces (thousand's separator), example "1 025 000" instead of "1025000".
I want to remove those whitespaces, only for the fields with digits so i could do a conversion to double with jolt transform to get a json file on output, i'm doing this on apache nifi with replaceText processor using regex expression.
this is an example of my csv :
Client1;Client2;Client3;price1;price2;price3
john smith;john2 smith2;john3 smith3;1 145;125;129 009
This expression that i'm using doesn't do the job : (\s?=(\d{3},?)+(?:\.\d{1,3})?")
Thanks in advance!
Upvotes: 1
Views: 1619
Reputation: 4132
Although you can do that via NiFi, I would suggest you to try changing the source and possibly correct the way the numbers are formatted and written.
Anyway, one way that comes immediately to my mind is to make use of ExecuteScript
processor to handle the whitespace part.
Assume you have the CSV as this:
name,val
item1, 1 345 000
item2, 2 432
You can use the SplitRecord
processor to convert the CSV to JSON and split it by 1 record. Feed the output of this to ExecuteScript
.
You can have the following Groovy
code to read the flowfile content and replace all the whitespaces
import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets
import groovy.json.JsonSlurper
flowFile = session.get()
if(!flowFile)return
def jsonSlurper = new JsonSlurper()
def text = ''
flowFile = session.write(flowFile, {inputStream, outputStream ->
input = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
inputJson = jsonSlurper.parseText(input)
inputJson.val = inputJson.val.replaceAll("\\s", "")
outputStream.write(inputJson.toString().getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
Connect the success
relationship of ExecuteScript
to a processor as demanded by your usecase. Anyway, the output for the provided input will look like this:
{
"name" : "item1",
"val" : "1345000"
}
{
"name" : "item2",
"val" : "2432"
}
Upvotes: 4