Reputation: 1500
We know that in spark there is a method rdd.collect which converts RDD to a list.
List<String> f= rdd.collect();
String[] array = f.toArray(new String[f.size()]);
I am trying to do exactly opposite in my project. I have an ArrayList of String which I want to convert to JavaRDD. I am looking for this solution for quite some time but have not found the answer. Can anybody please help me out here?
Upvotes: 36
Views: 59445
Reputation: 3462
If you are using a .scala file, or you don't want to or cannot use JavaSparkContext
, then you could:
JavaSparkContext
For example:
List<String> javaList = new ArrayList<>()
javaList.add("abc")
javaList.add("def")
sc.parallelize(javaList.asScala)
This will generate an RDD for you.
Upvotes: 0
Reputation: 35444
Adding to Sean Owen and others solutions
You can use JavaSparkContext#parallelizePairs
for List
ofTuple
List<Tuple2<Integer, Integer>> pairs = new ArrayList<>();
pairs.add(new Tuple2<>(0, 5));
pairs.add(new Tuple2<>(1, 3));
JavaSparkContext sc = new JavaSparkContext();
JavaPairRDD<Integer, Integer> rdd = sc.parallelizePairs(pairs);
Upvotes: 6
Reputation: 1228
There are two ways to convert a collection to a RDD.
1) sc.Parallelize(collection)
2) sc.makeRDD(collection)
Both of the method are identical, so we can use any of them
Upvotes: 4
Reputation: 65
List<StructField> fields = new ArrayList<>();
fields.add(DataTypes.createStructField("fieldx1", DataTypes.StringType, true));
fields.add(DataTypes.createStructField("fieldx2", DataTypes.StringType, true));
fields.add(DataTypes.createStructField("fieldx3", DataTypes.LongType, true));
List<Row> data = new ArrayList<>();
data.add(RowFactory.create("","",""));
Dataset<Row> rawDataSet = spark.createDataFrame(data, schema).toDF();
Upvotes: -3
Reputation: 66891
You're looking for JavaSparkContext.parallelize(List)
and similar. This is just like in the Scala API.
Upvotes: 58