Chaitra Bannihatti
Chaitra Bannihatti

Reputation: 211

Initialize an RDD to empty

I have an RDD called

JavaPairRDD<String, List<String>> existingRDD; 

Now I need to initialize this existingRDD to empty so that when I get the actual rdd's I can do a union with this existingRDD. How do I initialize existingRDD to an empty RDD except initializing it to null? Here is my code:

JavaPairRDD<String, List<String>> existingRDD;
if(ai.get()%10==0)
{
    existingRDD.saveAsNewAPIHadoopFile("s3://manthan-impala-test/kinesis-dump/" + startTime + "/" + k + "/" + System.currentTimeMillis() + "/",
    NullWritable.class, Text.class, TextOutputFormat.class); //on worker failure this will get overwritten                                  
}
else
{
    existingRDD.union(rdd);
}

Upvotes: 19

Views: 34089

Answers (7)

AshuGG
AshuGG

Reputation: 735

you can try the below snippet:

val emptyRDD = spark.emptyDataset[T].rdd

Upvotes: 0

eliasah
eliasah

Reputation: 40380

To create an empty RDD in Java, you'll just to do the following:

// Get an RDD that has no partitions or elements.
JavaSparkContext jsc;
...
JavaRDD<T> emptyRDD = jsc.emptyRDD();

I trust you know how to use generics, otherwise, for your case, you'll need:

JavaRDD<Tuple2<String,List<String>>> emptyRDD = jsc.emptyRDD();
JavaPairRDD<String,List<String>> emptyPairRDD = JavaPairRDD.fromJavaRDD(
  existingRDD
);

You can also use the mapToPair method to convert your JavaRDD to a JavaPairRDD.

In Scala :

val sc: SparkContext = ???
... 
val emptyRDD = sc.emptyRDD
// emptyRDD: org.apache.spark.rdd.EmptyRDD[Nothing] = EmptyRDD[1] at ...

Upvotes: 30

osac
osac

Reputation: 21

In Java, create empty pair RDD as follows:

JavaPairRDD<T, T> emptyPairRDD = JavaPairRDD.fromJavaRDD(SparkContext.emptyRDD());

Upvotes: 0

Thirupathi Chavati
Thirupathi Chavati

Reputation: 1871

val emptyRdd=sc.emptyRDD[String]

Above statement will create empty RDD with String Type

From SparkContext class:

Get an RDD that has no partitions or elements

def emptyRDD[T: ClassTag]: EmptyRDD[T] = new EmptyRDD[T] (this)

Upvotes: 4

Thiago Mata
Thiago Mata

Reputation: 2959

In Java, create the empty RDD was a little complex. I tried using the scala.reflect.classTag but it not work either. After many tests, the code that worked was even more simple.

private JavaRDD<Foo> getEmptyJavaRdd() {

/* this code does not compile because require <T> as parameter into emptyRDD */
//        JavaRDD<Foo> emptyRDD = sparkContext.emptyRDD();
//        return emptyRDD;

/* this should be the solution that try to emulate the scala <T> */
/* but i could not make it work too */
//        ClassTag<Foo> tag = scala.reflect.ClassTag$.MODULE$.apply(Foo.class);
//        return sparkContext.emptyRDD(tag);

/* this alternative worked into java 8 */
    return SparkContext.parallelize(
            java.util.Arrays.asList()
    );

}

Upvotes: 0

Nikhil Bhide
Nikhil Bhide

Reputation: 728

@eliasah answer is very useful, I am providing code to create empty pair RDD. Consider a scenario in which it is required to create empty pair RDD (key,value). Following scala code illustrates how to create empty pair RDD with key as String and value as Int.

type pairRDD = (String,Int)
var resultRDD = sparkContext.emptyRDD[pairRDD]

RDD would be created as follows :

resultRDD: org.apache.spark.rdd.EmptyRDD[(String, Int)] = EmptyRDD[0] at emptyRDD at <console>:29

Upvotes: 0

이동준
이동준

Reputation: 17

In scala, I used "parallelize" command.

val emptyRDD = sc.parallelize(Seq(""))

Upvotes: 0

Related Questions