johnnyB
johnnyB

Reputation: 165

How to combine two dataframes on parent-child relation (something between concat and merge)

I am struggling to form a tree like dataframe with children rows right under their parents. What I am trying to do is something between merging on object_id x parent_id and concatenating along axis 0. So what I am looking for is implementation of the interlace function in the snippet below.

In[1]: parents = pd.DataFrame({'object_id':[1,2],
                               'parent_id':[0,0],
                               'position': [1,2]})

In[2]: parents

Out[2]    object_id     parent_id   position
       0  1             0           1
       1  2             0           2

In[3]: children = pd.DataFrame({'object_id':[3,4,5],
                                'parent_id':[1,1,2],
                                'position': [1,2,1]})

In[4]: children

Out[4]:   object_id     parent_id   position
       0  3             1           1
       1  4             1           2
       2  5             2           1

In[5]: interlace(parent, children, on=('object_id', 'parent_id'))

Out[5]:  object_id  parent_id   position
      0  1          0           1
      1  3          1           1
      2  4          1           2
      3  2          0           1
      4  5          2           1

Is there an efficient way to do this in pandas? I think one could do something like

parents_with_children = []
for i, parentrow in parents.iteritems():
    childrenrows = children[children.parent_id == parentrow.object_id]
    parents_with_children.append(pd.concat([parentrow, childrenrows])
result = pd.concat(parents_with_children)

But I feel there should be and easier and probably more efficient way to do this.

edit: Rows with the same level and same parent need to stay sorted by their position.

Upvotes: 1

Views: 353

Answers (1)

burhan
burhan

Reputation: 924

A possible solution:

children['sort_id']=children.parent_id
parents['sort_id']=parents.object_id
pd.concat([parents,children]).sort_values(['sort_id', 'parent_id']).drop('sort_id', 1)

Upvotes: 1

Related Questions