Reputation: 109
I has a CSV
file.
This is my Input:
a _ \_ \ b_c b\_c "
Now, I want to convert a space delimited file to a CSV file. What should I do?
Fields not specified are considered "String 0" and are not enclosed in quotes.
This is Specifications:
1.The string "_" by itself is converted to a null string.
( -n option changes "_" )
2.The string \c is converted to c.
3.The backslash character \ by itself is converted to a space
4.The underscore is converted to a space if it occurs in a string.
( -s option changes "_" )
5.\n at the end of a line is converted automatically to \r\n.
6.Within String 1, " is converted to "".
I want to have the desired output result as below. Please help me.
"a","","_"," ","b c","b_c",""""
Upvotes: 1
Views: 256
Reputation: 711
The requirements are a little bit confusing to me, but you can try with this (which produces the expected output):
import scala.util.matching.Regex
val input = "a _ \\_ \\ b_c b\\_c \""
// List of replacements required (first replacement will be apply first)
val replacements: List[(Regex, String)] = List(
("""^_$""".r, ""),
("""(?<!\\)_""".r, " "),
("""\\(.)""".r, "$1"),
("""\\""".r, " "),
(""""""".r, "\"\""))
def applyReplacements(inputString: String, replacements: List[(Regex, String)]): String =
replacements match {
case Nil =>
inputString
case replacement :: tail =>
applyReplacements(
replacement._1.replaceAllIn(inputString, replacement._2),
tail)
}
def processLine(input: String): String = {
val inputArray = input.split(" ")
val outputArray = inputArray.map(x => applyReplacements(x, replacements))
val finalLine = outputArray.map(x => s"""\"${x}\"""").mkString(",")
// Use s"${finalLine}\r\n" instead if you need the '\r\n' ending
finalLine
}
processLine(input)
// output:
// String = "a","","_"," ","b c","b_c",""""
Probably you will have to apply some modifications to fully adapt it to your requirements (which are not fully clear to me).
If you need to apply this over a Spark RDD, you will have to put processLine
in a map
so that it processes every line in the RDD.
Hope it helps.
Upvotes: 1