user7752317
user7752317

Reputation:

remove spaces from string - regex

I have a csv file with string fields containing digits separated by whitespaces (thousand's separator), example "1 025 000" instead of "1025000".

I want to remove those whitespaces, only for the fields with digits so i could do a conversion to double with jolt transform to get a json file on output, i'm doing this on apache nifi with replaceText processor using regex expression.

this is an example of my csv :

Client1;Client2;Client3;price1;price2;price3
john smith;john2 smith2;john3 smith3;1 145;125;129 009

This expression that i'm using doesn't do the job : (\s?=(\d{3},?)+(?:\.\d{1,3})?")

Thanks in advance!

Upvotes: 1

Views: 1619

Answers (1)

Sivaprasanna Sethuraman
Sivaprasanna Sethuraman

Reputation: 4132

Although you can do that via NiFi, I would suggest you to try changing the source and possibly correct the way the numbers are formatted and written.

Anyway, one way that comes immediately to my mind is to make use of ExecuteScript processor to handle the whitespace part.

Assume you have the CSV as this:

name,val
item1, 1 345 000
item2, 2 432

You can use the SplitRecord processor to convert the CSV to JSON and split it by 1 record. Feed the output of this to ExecuteScript.

You can have the following Groovy code to read the flowfile content and replace all the whitespaces

import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets
import groovy.json.JsonSlurper

flowFile = session.get()
if(!flowFile)return

def jsonSlurper = new JsonSlurper()
def text = ''

flowFile = session.write(flowFile, {inputStream, outputStream ->
    input = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
    inputJson = jsonSlurper.parseText(input)
    inputJson.val = inputJson.val.replaceAll("\\s", "")
    outputStream.write(inputJson.toString().getBytes(StandardCharsets.UTF_8))
} as StreamCallback)

session.transfer(flowFile, REL_SUCCESS)

Connect the success relationship of ExecuteScript to a processor as demanded by your usecase. Anyway, the output for the provided input will look like this:

{
  "name" : "item1",
  "val" : "1345000"
}

{
  "name" : "item2",
  "val" : "2432"
}

Upvotes: 4

Related Questions