Reputation: 85
I am trying to use a Dataframe as RDD. When using a map method and extracting elements, the "[" & "]" characters are also being extracted. How can I avoid this? Is there a mistake in my logic? I am sharing the details.
dateframe name is olympics
scala> olympics.first
res1812: org.apache.spark.sql.Row = [Michael Phelps,23,United States,2008,8/24/2008,Swimming,8,0,0,8]
dataframe as rdd by using map method
scala> olympics.map(x => x.toString.split(",")).first
res1814: Array[String] = Array([Michael Phelps, 23, United States, 2008, 8/24/2008, Swimming, 8, 0, 0, 8])
As you can see in the above result, the characters "[" "]" are also being extracted.
Expected result is:
Array[String] = Array(Michael Phelps, 23, United States, 2008, 8/24/2008, Swimming, 8, 0, 0, 8)
I do not want the first and last character from the row to be captured. I tried using substring but its only extracting the first element.
How do i solve this? Pls help
Upvotes: 1
Views: 55
Reputation: 23119
If you are writing the data to output file or database you can make a string
from that array
of string
without "["
and "]"
as below
olympics.map(x => x.mkString(",")).first
Hope this helps!
Upvotes: 0