Defcon
Defcon

Reputation: 817

Regex to trim all spaces before and after delimiter Spark Scala

I am reading a delimited textfile in with spark scala. I am trying to create a regex trim for everything before the delimiter ~ and everything after. Currently, I have it where all spaces are trimmed. I am looking for any suggestions on how to accomplish this or improvements. Perhaps some sort of trim function would be simpler.

 def truncateRDD(fileName : String): RDD[String] = {
    val rdd = sc.textFile(fileName)
    rdd.map(lines => lines.replaceAll("""[\t\p{Zs}]+""", ""))
  }

Input:

20161111 ~     ~10~1234~ "This is an example" ~P15~-EXAMPLE~2017~ 2014567EXAMPLE

Desired Output:

20161111~~10~1234~"This is an example"~P15~-EXAMPLE~2017~2014567EXAMPLE

Upvotes: 0

Views: 3474

Answers (1)

Tzach Zohar
Tzach Zohar

Reputation: 37832

The simplest approach would probably be to split by your delimiter (~) and then trim each resulting token; Then - combine the tokens back into a single String using mkString:

rdd.map(_.split("~").map(_.trim).mkString("~"))

Alternatively - using a regex:

rdd.map(_.replaceAll("\\s*?~\\s*", "~"))

Upvotes: 2

Related Questions