NewbieCoder
NewbieCoder

Reputation: 21

How do I replace the nth occurrence of a special character, say, a pipe delimiter with another in Scala?

I'm new to Spark using Scala and I need to replace every nth occurrence of the delimiter with the newline character.

So far, I have been successful at entering a new line after the pipe delimiter. I'm unable to replace the delimiter itself.

My input string is

val txt = "January|February|March|April|May|June|July|August|September|October|November|December"

println(txt.replaceAll(".\\|", "$0\n"))

The above statement generates the following output.

January|
February|
March|
April|
May|
June|
July|
August|
September|
October|
November|
December

I referred to the suggestion at https://salesforce.stackexchange.com/questions/189923/adding-comma-separator-for-every-nth-character but when I enter the number in the curly braces, I only end up adding the newline after 2 characters after the delimiter.

I'm expecting my output to be as given below.

January|February
March|April
May|June
July|August
September|October
November|December

How do I change my regular expression to get the desired output?

Update: My friend suggested I try the following statement

println(txt.replaceAll("(.*?\\|){2}", "$0\n"))

and this produced the following output

January|February|
March|April|
May|June|
July|August|
September|October|
November|December

Now I just need to get rid of the pipe symbol at the end of each line.

Upvotes: 1

Views: 1311

Answers (2)

jwvh
jwvh

Reputation: 51271

You want to move the 2nd bar | outside of the capture group.

txt.replaceAll("([^|]+\\|[^|]+)\\|", "$1\n")
//val res0: String =
//  January|February
//  March|April
//  May|June
//  July|August
//  September|October
//  November|December

Regex Explained (regex is not Scala)

  • ( - start a capture group
  • [^|] - any character as long as it's not the bar | character
  • [^|]+ - 1 or more of those (any) non-bar chars
  • \\| - followed by a single bar char |
  • [^|]+ - followed by 1 or more of any non-bar chars
  • ) - close the capture group
  • \\| - followed by a single bar char (not in capture group)
  • "$1\n" - replace the entire matching string with just the first $1 capture group ($0 is the entire matching string) followed by the newline char

UPDATE

For the general case of N repetitions, regex becomes a bit more cumbersome, at least if you're trying to do it with a single regex formula.

The simplest thing to do (not the most efficient but simple to code) is to traverse the String twice.

val n = 5
txt.replaceAll(s"(\\w+\\|){$n}", "$0\n")
   .replaceAll("\\|\n", "\n")
//val res0: String =
//  January|February|March|April|May
//  June|July|August|September|October
//  November|December

Upvotes: 3

Nikunj Kakadiya
Nikunj Kakadiya

Reputation: 3008

You could first split the string using '|' to get the array of string and then loop through it to perform the logic you want and get the output as required.

val txt = "January|February|March|April|May|June|July|August|September|October|November|December"
val out = txt.split("\\|")
var output: String = ""
for(i<-0 until out.length -1 by 2){
  val ref = out(i) + "|" + out(i+1) + "\n"
  output = output + ref
}
val finalout = output.replaceAll("\"\"","")  //just to remove the starting double quote
println(finalout)

Upvotes: 1

Related Questions