Reputation: 21
I have a RDD[String]
which contains following data:
data format : ('Movie Name','Actress Name')
('Night of the Demons (2009) (uncredited)', '"Steff", Stefanie Oxmann Mcgaha')
('The Bad Lieutenant: Port of Call - New Orleans (2009) (uncredited)', '"Steff", Stefanie Oxmann Mcgaha')
('"Please Like Me" (2013) {All You Can Eat (#1.4)}', '$haniqua')
('"Please Like Me" (2013) {French Toast (#1.2)}', '$haniqua')
('"Please Like Me" (2013) {Horrible Sandwiches (#1.6)}', '$haniqua')
I want to convert this to RDD[String,String]
such as first element within ' '
will be my first String in RDD and second element within ' '
will be my second String in RDD.
I tried this:
val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
val splitRdd = rdd1.map( line => line.split(",") )
splitRdd.foreach(println)
but it's giving me an error as :
[Ljava.lang.String;@7741fb9
[Ljava.lang.String;@225f63a5
[Ljava.lang.String;@63640bc4
[Ljava.lang.String;@1354c1de
Upvotes: 1
Views: 2129
Reputation: 8957
Try this to convert RDD[String]
to RDD[String,String]
val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
val splitRdd = rdd1.map( line => (line.split(",")(0), line.split(",")(1)) )
The above line returns the rdd as key, value pair [Tuple
] RDD.
Upvotes: 0
Reputation: 1323
Since it is csv file with field-enclosed & row-enclosed, you need to read the file using regular expressions. Simple split doesn't work.
Upvotes: 0
Reputation: 1734
It's not an error. we could also use flatMap() here to avoid confusion,
val rdd1 = sc.textFile("/home/user1/Documents/TestingScala/actress"
rdd1.flatMap( line => line.split(",")).foreach(println)
Here, The input function to map returns a single element (array), while the flatMap returns a list of elements (0 or more). Also, the output of the flatMap is flattened.
Upvotes: 0
Reputation: 4623
[Ljava.lang.String;@7741fb9
is not an error, This is wt is printed when you try to print an array.
[
- an single-dimensional array
L
- the array contains a class or interface
java.lang.String
- the type of objects in the array
@
- joins the string together
7741fb9
the hashcode of the object.
To print
String array
you can try this code:
import scala.runtime.ScalaRunTime._
splitRdd.foreach(array => println(stringOf(array)))
Upvotes: 5