Reputation: 23109
What is the best way to read a text file with new line delimiter as "^*~"
and column delimiter as "^|&"
. I have a file with large number of column like more than 100. Please suggest the efficient way. Below is the file with few fields.
I have a file like
abcd^|&cdef^|&25^|&hile^|&12345^*~xyxxx^|&zzzzz^|&70^|&dharan^|&6567576
I want this file to be like
fname lname age address phone
abcd cdef 25 abc 1234523
xyxxx zzzzz 70 xyz 6567576
Upvotes: 2
Views: 1786
Reputation: 40370
You'll need to flatMap and split using the escaped characters for your delimiter in order to create lines and then split on your second delimiter with the same approach and then pattern match to get tuples :
val str = "abcd^|&cdef^|&25^|&hile^|&12345^*~xyxxx^|&zzzzz^|&70^|&dharan^|&6567576"
val rdd = sc.parallelize(Seq(str))
val rdd2 = rdd.flatMap(_.split("\\^\\*~")).map(_.split("\\^\\|\\&") match {
case Array(a, b, c, d, e) => (a, b, c, d, e)
})
rdd2.toDF("fname","lname","age","address","phone").show
// +-----+-----+---+-------+-------+
// |fname|lname|age|address| phone|
// +-----+-----+---+-------+-------+
// | abcd| cdef| 25| hile| 12345|
// |xyxxx|zzzzz| 70| dharan|6567576|
// +-----+-----+---+-------+-------+
Upvotes: 3