Join relatively small table with large table in Spark 2.1

Question

I am currently working on updating a table based on its existence on another table:

Ex:

Dataset A (relatively small, 300k of rows): DepartmentId, EmployeeId, Salary, Error

Dataset B (relatively huge, millions of rows): DepartmentId, EmployeeId, Salary

The logic is: 1. If A's (DepartmentId, EmployeeId) pair exists in B, then update A's salary with B's salary 2. Otherwise, write a message to A's error field

The solution I have now is doing a left outer join on A with B. Is there any other better practices for this type of problem?

Thank you in advance!

Join relatively small table with large table in Spark 2.1

Answers (1)

Related Questions