Reputation: 361
I am used to using SQL to solve hierarchical joins but I'm wondering if it can be done in Python, maybe using Pandas. And which one is more efficient to go for?
CSV Data:
emp_id,fn,ln,mgr_id
1,Matthew,Reichek,NULL
2,John,Cottone,3
3,Chris,Winter,1
4,Sergey,Bobkov,2
5,Andrey,Botelli,2
6,Karen,Goetz,7
7,Tri,Pham,3
8,Drew,Thompson,7
9,BD,Alabi,7
10,Sreedhar,Kavali,7
I want to find the Level of each employee (Boss is Level 1 and so on):
My Recursive Code in SQL would be:
with recursive cte as
(
select employee_id, first_name, last_name, manager_id, 1 as level
from icqa.employee
where manager_id is null
union
select e.employee_id, e.first_name, e.last_name, e.manager_id, cte.level + 1
from icqa.employee e
inner join cte
on e.manager_id = cte.employee_id
where e.manager_id is not null
)
select * from cte
Upvotes: 4
Views: 3970
Reputation: 880359
You could make a dict mapping emp_id
to mgr_id
and then make a recursive function like
idmap = dict(zip(df['emp_id'], df['mgr_id']))
def depth(id_):
if np.isnan(id_):
return 1
return depth(idmap[id_]) + 1
to compute the depth given an id
.
To make it more efficient (by not repeating calculations for the same id
),
you could use memoization (handled by the @functools.lru_cache
decorator below):
import numpy as np
import pandas as pd
import functools
nan = np.nan
df = pd.DataFrame({'emp_id': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'fn': ['Matthew', 'John', 'Chris', 'Sergey', 'Andrey', 'Karen', 'Tri', 'Drew', 'BD', 'Sreedhar'], 'ln': ['Reichek', 'Cottone', 'Winter', 'Bobkov', 'Botelli', 'Goetz', 'Pham', 'Thompson', 'Alabi', 'Kavali'], 'mgr_id': [nan, 3.0, 1.0, 2.0, 2.0, 7.0, 3.0, 7.0, 7.0, 7.0]})
def make_depth(df):
idmap = dict(zip(df['emp_id'], df['mgr_id']))
@functools.lru_cache()
def depth(id_):
if np.isnan(id_):
return 1
return depth(idmap[id_]) + 1
return depth
df['depth'] = df['mgr_id'].apply(make_depth(df))
print(df.sort_values(by='depth'))
yields
emp_id fn ln mgr_id depth
0 1 Matthew Reichek NaN 1
2 3 Chris Winter 1.0 2
1 2 John Cottone 3.0 3
6 7 Tri Pham 3.0 3
3 4 Sergey Bobkov 2.0 4
4 5 Andrey Botelli 2.0 4
5 6 Karen Goetz 7.0 4
7 8 Drew Thompson 7.0 4
8 9 BD Alabi 7.0 4
9 10 Sreedhar Kavali 7.0 4
Upvotes: 5