Reputation: 13902
Assume Employee
is a Java Class.
I have a JavaRDD<Employee[]> arrayOfEmpList
, i.e, each RDD has an array of employees
.
Out of this, I want to create a single list of employees, something like
JavaRDD<Employee>
This is what i tried:
Created a List<Employee> empList = new ArrayList<Employee>();
then foreach RDD of Employee[]:
arrayOfEmpList.forEach(new VoidFunction<Employee[]>(){
public void call(Employee[] arg0){
empList.addAll(Arrays.asList(arg0));
System.out.println(empList.size()); //prints correct values incrementally
}
});
System.out.println(empList.size()); //gives 0
I am not able to get the size outside foreach loop.
Is there some other way to achieve this?
P.S: i want to have all employee records as separate RDD, so 1st employee list may contain 10 records, 2nd may contain 100 records, 3rd may contain 200 records. i want a final list of 330 records, which i can then parallelize and perform actions upon.
Upvotes: 0
Views: 3851
Reputation: 45319
What you need is the flatMap
transformation on your array. I'm first converting your employee array into a list:
JavaRDD<Employee> employeeRDD = arrayOfEmployeeList.flatMap(empArray -> Arrays.asList(empArray));
Check, perhaps the method has an overload that takes an array directly, not just a collection.
You can see this in the transformations section of the programming guide: http://spark.apache.org/docs/latest/programming-guide.html#transformations
Upvotes: 1