Jayson
Jayson

Reputation: 85

Using Dataframe as RDD

I am trying to use a Dataframe as RDD. When using a map method and extracting elements, the "[" & "]" characters are also being extracted. How can I avoid this? Is there a mistake in my logic? I am sharing the details.

dateframe name is olympics

scala> olympics.first
res1812: org.apache.spark.sql.Row = [Michael Phelps,23,United States,2008,8/24/2008,Swimming,8,0,0,8]

dataframe as rdd by using map method

scala> olympics.map(x => x.toString.split(",")).first
res1814: Array[String] = Array([Michael Phelps, 23, United States, 2008, 8/24/2008, Swimming, 8, 0, 0, 8])

As you can see in the above result, the characters "[" "]" are also being extracted.

Expected result is:

Array[String] = Array(Michael Phelps, 23, United States, 2008, 8/24/2008, Swimming, 8, 0, 0, 8)

I do not want the first and last character from the row to be captured. I tried using substring but its only extracting the first element.

How do i solve this? Pls help

Upvotes: 1

Views: 55

Answers (1)

koiralo
koiralo

Reputation: 23119

If you are writing the data to output file or database you can make a string from that array of string without "[" and "]" as below

olympics.map(x => x.mkString(",")).first

Hope this helps!

Upvotes: 0

Related Questions