Reputation: 127
I want to move every row that has the value in before_id column into the below of the row with same value as that before_id in its node_id column. I've been struggling with it and I already tried .reindex but it was a not good solution for me. So is it possible to change the position of each row ?? or should I do it without pandas ?? Could you anyone give me tips ?? Thank you so much .
client_data = """node_id,option,before_id
1,A,
5,A,4
4,C,1
6,A,
2,A,1
7,C,6
"""
df = pd.read_csv(io.StringIO(client_data), error_bad_lines=False)
before = pd.notnull(df['before_id'])
for ind,row in df.iterrows():
if pd.notnull(row[2]):
before_id = row[2] # identifying before_id column
before_ind = df.loc[df['node_id'] == before_id].index
next_ind = before_ind[0] + 1 # the goal index
row.reindex(index=next_ind)
else:
pass
df.to_csv('./result3.csv', index=False)
ideal output
node_id,option,before_id
1,A,
2,A,1
8,C,2
6,A,
7,C,6
Upvotes: 1
Views: 226
Reputation: 4011
If I understand correctly:
df = pd.DataFrame({'node_id': {0: 1, 1: 5, 2: 4, 3: 6, 4: 2, 5: 7},
'option': {0: 'A', 1: 'A', 2: 'C', 3: 'A', 4: 'A', 5: 'C'},
'before_id': {0: np.nan, 1: 4.0, 2: 1.0, 3: np.nan, 4: 1.0, 5: 6.0}})
# sort so 'option' is ordered alphabetically when 'before_id' is same (?)
df.sort_values('option', inplace=True)
# count occurrences of each 'before_id' & add to create counter
df['x'] = .01
df['y'] = df.groupby('before_id')['x'].transform(np.cumsum) + df['before_id']
# create sorting column
df['z'] = df['y'].fillna(df['node_id'])
df.sort_values(['z'], inplace=True)
returns
node_id option before_id x y z
0 1 A NaN 0.01 NaN 1.00
4 2 A 1.0 0.01 1.01 1.01
2 4 C 1.0 0.01 1.02 1.02
1 5 A 4.0 0.01 4.01 4.01
3 6 A NaN 0.01 NaN 6.00
5 7 C 6.0 0.01 6.01 6.01
and you can then drop x
, y
,and z
.
Upvotes: 1