JavaRDD to JavaRDD

Question

I am reading a txt file as a JavaRDD with the following command:

JavaRDD vertexRDD = ctx.textFile(pathVertex);

Now, I would like to convert this to a JavaRDD because in that txt file I have two columns of Integers and want to add some schema to the rows after splitting the columns.

I tried also this:

JavaRDD rows = vertexRDD.map(line -> line.split("	"))

But is says I cannot assign the map function to an "Object" RDD

How can I create a JavaRDD out of a JavaRDD
How can I use map to the JavaRDD?

Thanks!

Oli · Accepted Answer

Creating a JavaRDD out of another is implicit when you apply a transformation such as map. Here, the RDD you create is a RDD of arrays of strings (result of split).

To get a RDD of rows, just create a Row from the array:

JavaRDD vertexRDD = ctx.textFile("");
JavaRDD rddOfArrays = vertexRDD.map(line -> line.split("	"));
JavaRDD rddOfRows =rddOfArrays.map(fields -> RowFactory.create(fields));

Note that if your goal is then to transform the JavaRDD to a dataframe (Dataset), there is a simpler way. You can change the delimiter option when using spark.read to avoid having to use RDDs:

Dataset dataframe = spark.read()
    .option("delimiter", "	")
    .csv("your_path/file.csv");

JavaRDD<String> to JavaRDD<Row>

Answers (2)

Related Questions

JavaRDD&lt;String&gt; to JavaRDD&lt;Row&gt;

Answers (2)

Related Questions

JavaRDD<String> to JavaRDD<Row>