okkotsu
okkotsu

Reputation: 127

Shift the records in Pandas DataFrame with conditions

I've been struggling with shifting pandas record's position with specific conditions. What I want to do is to move every row that has a value in its before_id(it includes String) column into the next index place of another row whose value in node_id(it includes String) is same as it. And my code didn't work as my expectation. It returned no error but didn't sort them at all.

I'm a little bit worried about my explain though .... Do you have any idea to achieve this ?? or Is it impossible to do in Pandas ??

client_data = """node_id,option,before_id
    1aa,A,
    8xyz,C,2aa
    2aa,A,1aa
    5mm,A,4bb
    4bb,C,8xyz
    6ccc,5mm,
    7,C,6ccc       
    """


df = pd.read_csv(io.StringIO(client_data), dtype='string', error_bad_lines=False)
    before = pd.notnull(df['before_id'])
    for ind,row in df.iterrows():
        if pd.notnull(row[2]):
            before_id = row[2]
            before_ind = df.index[df['node_id']==before_id]
            next_ind = before_ind + 1
            row.reindex(index=next_ind)
        else:
            pass
df

node_id,option,before_id
1aa,A,
8xyz,C,2aa
2aa,A,1aa
5mm,A,4bb
4bb,C,8xyz
6ccc,5mm,
7,C,6ccc


ideal output 

node_id,option,before_id
1aa,A,
2aa,A,1aa
8xyz,C,2aa
4bb,C,8xyz
5mm,A,4bb
6ccc,5mm,
7,C,6ccc

Upvotes: 1

Views: 105

Answers (1)

S2L
S2L

Reputation: 1934

Appears that you are trying to solve a depth-first search graph problem. Below solution is using networkx graph library.

import pandas as pd
import io

x = '''node_id,option,before_id
1aa,A,
8xyz,C,2aa
2aa,A,1aa
5mm,A,4bb
4bb,C,8xyz
6ccc,5mm,
7,C,6ccc'''

df = pd.read_csv(io.StringIO(x))
print("Original\n",df)
print('--------------\n')

import networkx as nx
G = nx.from_pandas_edgelist(df, 'before_id', 'node_id', 'option')
sortedNodes = [n[1] for n in list(nx.dfs_edges(G))]

df.node_id = df.node_id.astype('category')
df.node_id.cat.set_categories(sortedNodes, inplace=True)
df = df.sort_values(['node_id'])
print("Node-sorted\n",df)

Output:

Original
   node_id option before_id
0     1aa      A       NaN
1    8xyz      C       2aa
2     2aa      A       1aa
3     5mm      A       4bb
4     4bb      C      8xyz
5    6ccc    5mm       NaN
6       7      C      6ccc
--------------

Node-sorted
   node_id option before_id
0     1aa      A       NaN
2     2aa      A       1aa
1    8xyz      C       2aa
4     4bb      C      8xyz
3     5mm      A       4bb
5    6ccc    5mm       NaN
6       7      C      6ccc

Upvotes: 4

Related Questions