Reputation: 1343
In oracle, I can do something like this for parent-child hierarchy structure. It is very handy. Now do I do the equivalent in pandas dataframe
with tab1zx as (
select 1 as id, null as parent_id from dual union
select 2 as id, 1 as parent_id from dual union
select 3 as id, 2 as parent_id from dual union
select 4 as id, 2 as parent_id from dual union
select 5 as id, 4 as parent_id from dual union
select 6 as id, 4 as parent_id from dual union
select 7 as id, 1 as parent_id from dual union
select 8 as id, 7 as parent_id from dual union
select 9 as id, 1 as parent_id from dual union
select 10 as id, 9 as parent_id from dual union
select 11 as id, 10 as parent_id from dual union
select 12 as id, 9 as parent_id from dual
)
--select * from tab1zx
SELECT id,
parent_id,
RPAD('.', (level-1)*2, '.') || id AS tree,
level,
CONNECT_BY_ROOT id AS root_id,
LTRIM(SYS_CONNECT_BY_PATH(id, '-'), '-') AS path,
CONNECT_BY_ISLEAF AS leaf
FROM tab1zx
START WITH parent_id IS NULL
CONNECT BY parent_id = PRIOR id
ORDER SIBLINGS BY id;
Now if I have a python dataframe how do I do this
aa={
'id':["1","2","3","4","5","6","7","8","9","10","11","12"]
,'parent_id':["","1","2","2","4","4","1","7","1","9","10","9"]
}
import pandas as pd
df=pd.DataFrame.from_dict(aa)
Upvotes: 2
Views: 1154
Reputation: 210852
you can try to do a cartesian product and filter it:
In [32]: df.assign(k=0) \
.merge(df.assign(k=0), on='k', suffixes=['1','2']) \
.query("id1 == parent_id2")
Out[32]:
id1 parent_id1 k id2 parent_id2
1 1 0 2 1
6 1 0 7 1
8 1 0 9 1
14 2 1 0 3 2
15 2 1 0 4 2
40 4 2 0 5 4
41 4 2 0 6 4
79 7 1 0 8 7
105 9 1 0 10 9
107 9 1 0 12 9
118 10 9 0 11 10
PS AFAIK there is no efficient analog for Oracle's hierarchical queries
in Pandas
Upvotes: 3