Reputation: 27
I have employee table, where employee id and supervisor is present. I want to find the hierarchy for the employee in five levels.
Example: Employee 1 is reported to 2, 2 reported to 4,4 reported to 17, 17 reported to 20. But we not able to find 20 supervisor so we kept the supervisor for 20 is 20 itself.
EmployeeID | SupervisiorID |
---|---|
1 | 2 |
2 | 4 |
8 | 6 |
9 | 5 |
6 | 3 |
5 | 10 |
4 | 17 |
3 | 15 |
10 | 20 |
15 | 20 |
17 | 20 |
16 | 21 |
15 | 13 |
14 | 12 |
13 | 11 |
Excepted output
EmployeeID | SupervisiorID_1 | SupervisiorID_2 | SupervisiorID_3 | SupervisiorID_4 | SupervisiorID_5 |
---|---|---|---|---|---|
1 | 2 | 4 | 17 | 20 | 20 |
2 | 4 | 17 | 20 | 20 | 20 |
8 | 6 | 3 | 15 | 20 | 20 |
9 | 5 | 10 | 20 | 20 | 20 |
6 | 3 | 15 | 20 | 20 | 20 |
5 | 10 | 20 | 20 | 20 | 20 |
4 | 17 | 20 | 20 | 20 | 20 |
3 | 15 | 20 | 20 | 20 | 20 |
10 | 20 | 20 | 20 | 20 | 20 |
15 | 20 | 20 | 20 | 20 | 20 |
17 | 20 | 20 | 20 | 20 | 20 |
16 | 21 | 21 | 21 | 21 | 21 |
15 | 13 | 11 | 11 | 11 | 11 |
14 | 12 | 12 | 12 | 12 | 12 |
13 | 11 | 11 | 11 | 11 | 11 |
How can we achieve this in Spark using dataframe recursively.
Upvotes: 1
Views: 1464
Reputation: 18013
Although this has been asked many times, someone here https://dwgeek.com/spark-sql-recursive-dataframe-pyspark-and-scala.html/ has solved this.
Upvotes: 0
Reputation: 584
If you only have 5 levels, than it is better to use 4 joins to do the job. In my point of view, spark doesn't support natively recursive solutions for such scenario. If you really want to do it in a recursive way, you may need to collect the data u need and do it on driver locally.
Upvotes: 1