Reputation: 1383
I am creating an RDD using parallelized collections in apache spark. However when I call the parallelize method on the spark context, the method takes multiple arguments. Whereas the method has been documented everywhere as taking only a single list parameter. I am unable to understand what the additional two parameters do as the spark documentation also doesnt provide clear understanding on it. Following is the message I get when I pass a single parameter.
The method parallelize(Seq<T>, int, ClassTag<T>) in the type SparkContext is not applicable for the arguments (List<Integer>)
Following is my code:-
List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> distData = sc.parallelize(data);
Upvotes: 0
Views: 1370
Reputation: 25909
You should use the JavaSparkContext in Java (not the scala one) and then you'd have parallelize for List[T] see http://spark.apache.org/docs/0.6.0/api/core/spark/api/java/JavaSparkContext.html
Upvotes: 4