Reputation: 211
I have an RDD called
JavaPairRDD<String, List<String>> existingRDD;
Now I need to initialize this existingRDD
to empty so that when I get the actual rdd's I can do a union with this existingRDD
.
How do I initialize existingRDD
to an empty RDD except initializing it to null?
Here is my code:
JavaPairRDD<String, List<String>> existingRDD;
if(ai.get()%10==0)
{
existingRDD.saveAsNewAPIHadoopFile("s3://manthan-impala-test/kinesis-dump/" + startTime + "/" + k + "/" + System.currentTimeMillis() + "/",
NullWritable.class, Text.class, TextOutputFormat.class); //on worker failure this will get overwritten
}
else
{
existingRDD.union(rdd);
}
Upvotes: 19
Views: 34089
Reputation: 735
you can try the below snippet:
val emptyRDD = spark.emptyDataset[T].rdd
Upvotes: 0
Reputation: 40380
To create an empty RDD in Java, you'll just to do the following:
// Get an RDD that has no partitions or elements.
JavaSparkContext jsc;
...
JavaRDD<T> emptyRDD = jsc.emptyRDD();
I trust you know how to use generics, otherwise, for your case, you'll need:
JavaRDD<Tuple2<String,List<String>>> emptyRDD = jsc.emptyRDD();
JavaPairRDD<String,List<String>> emptyPairRDD = JavaPairRDD.fromJavaRDD(
existingRDD
);
You can also use the mapToPair
method to convert your JavaRDD
to a JavaPairRDD
.
In Scala :
val sc: SparkContext = ???
...
val emptyRDD = sc.emptyRDD
// emptyRDD: org.apache.spark.rdd.EmptyRDD[Nothing] = EmptyRDD[1] at ...
Upvotes: 30
Reputation: 21
In Java, create empty pair RDD as follows:
JavaPairRDD<T, T> emptyPairRDD = JavaPairRDD.fromJavaRDD(SparkContext.emptyRDD());
Upvotes: 0
Reputation: 1871
val emptyRdd=sc.emptyRDD[String]
Above statement will create empty RDD with String
Type
From SparkContext class:
Get an RDD that has no partitions or elements
def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T] (this)
Upvotes: 4
Reputation: 2959
In Java, create the empty RDD was a little complex. I tried using the scala.reflect.classTag but it not work either. After many tests, the code that worked was even more simple.
private JavaRDD<Foo> getEmptyJavaRdd() {
/* this code does not compile because require <T> as parameter into emptyRDD */
// JavaRDD<Foo> emptyRDD = sparkContext.emptyRDD();
// return emptyRDD;
/* this should be the solution that try to emulate the scala <T> */
/* but i could not make it work too */
// ClassTag<Foo> tag = scala.reflect.ClassTag$.MODULE$.apply(Foo.class);
// return sparkContext.emptyRDD(tag);
/* this alternative worked into java 8 */
return SparkContext.parallelize(
java.util.Arrays.asList()
);
}
Upvotes: 0
Reputation: 728
@eliasah answer is very useful, I am providing code to create empty pair RDD. Consider a scenario in which it is required to create empty pair RDD (key,value). Following scala code illustrates how to create empty pair RDD with key as String and value as Int.
type pairRDD = (String,Int)
var resultRDD = sparkContext.emptyRDD[pairRDD]
RDD would be created as follows :
resultRDD: org.apache.spark.rdd.EmptyRDD[(String, Int)] = EmptyRDD[0] at emptyRDD at <console>:29
Upvotes: 0
Reputation: 17
In scala, I used "parallelize" command.
val emptyRDD = sc.parallelize(Seq(""))
Upvotes: 0