Mookayama
Mookayama

Reputation: 1343

python dataframe do thing like oracle connect_by?

In oracle, I can do something like this for parent-child hierarchy structure. It is very handy. Now do I do the equivalent in pandas dataframe

with tab1zx as (
select 1 as id, null as parent_id from dual union
select 2 as id, 1 as parent_id from dual union
select 3 as id, 2 as parent_id from dual union
select 4 as id, 2 as parent_id from dual union
select 5 as id, 4 as parent_id from dual union
select 6 as id, 4 as parent_id from dual union
select 7 as id, 1 as parent_id from dual union
select 8 as id, 7 as parent_id from dual union
select 9 as id, 1 as parent_id from dual union
select 10 as id, 9 as parent_id from dual union
select 11 as id, 10 as parent_id from dual union
select 12 as id, 9 as parent_id from dual

)
--select * from tab1zx
SELECT id,
       parent_id,
       RPAD('.', (level-1)*2, '.') || id AS tree,
       level,
       CONNECT_BY_ROOT id AS root_id,
       LTRIM(SYS_CONNECT_BY_PATH(id, '-'), '-') AS path,
       CONNECT_BY_ISLEAF AS leaf
FROM   tab1zx
START WITH parent_id IS NULL
CONNECT BY parent_id = PRIOR id
ORDER SIBLINGS BY id;

Now if I have a python dataframe how do I do this

aa={
    'id':["1","2","3","4","5","6","7","8","9","10","11","12"]
    ,'parent_id':["","1","2","2","4","4","1","7","1","9","10","9"]
}

import pandas as pd
df=pd.DataFrame.from_dict(aa)

Upvotes: 2

Views: 1154

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210852

you can try to do a cartesian product and filter it:

In [32]: df.assign(k=0) \
           .merge(df.assign(k=0), on='k', suffixes=['1','2']) \
           .query("id1 == parent_id2")
Out[32]:
    id1 parent_id1  k id2 parent_id2
1     1             0   2          1
6     1             0   7          1
8     1             0   9          1
14    2          1  0   3          2
15    2          1  0   4          2
40    4          2  0   5          4
41    4          2  0   6          4
79    7          1  0   8          7
105   9          1  0  10          9
107   9          1  0  12          9
118  10          9  0  11         10

PS AFAIK there is no efficient analog for Oracle's hierarchical queries in Pandas

Upvotes: 3

Related Questions