adarsh hegde
adarsh hegde

Reputation: 1383

Parallelized collection using spark context method arguments are incorrect

I am creating an RDD using parallelized collections in apache spark. However when I call the parallelize method on the spark context, the method takes multiple arguments. Whereas the method has been documented everywhere as taking only a single list parameter. I am unable to understand what the additional two parameters do as the spark documentation also doesnt provide clear understanding on it. Following is the message I get when I pass a single parameter.

The method parallelize(Seq<T>, int, ClassTag<T>) in the type SparkContext is not applicable for the arguments (List<Integer>)

Following is my code:-

List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> distData = sc.parallelize(data);

Upvotes: 0

Views: 1370

Answers (1)

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25909

You should use the JavaSparkContext in Java (not the scala one) and then you'd have parallelize for List[T] see http://spark.apache.org/docs/0.6.0/api/core/spark/api/java/JavaSparkContext.html

Upvotes: 4

Related Questions