Python + pandas replace column values based in other column with

Question

My question is simple, let's start with an hypothetical table

| a | b | c |  
| 0 | 0 | 0 |  
| 0 | 0 | 0 |  
| 0 | 1 | 1 |  
| 0 | 1 | 2 |  
| 1 | 2 | 3 |  
| 1 | 2 | 3 |  
| 1 | 3 | 4 |

This table should be modified, the numbers in column b don't be the same existing in column a, and column c don't be the same number in column b.
So the number should be incremented continually.
The final should looks like this:

| a | b | c  |  
| 0 | 2 | 6  |  
| 0 | 2 | 6  |  
| 0 | 3 | 7  |  
| 0 | 3 | 8  |  
| 1 | 4 | 9  |  
| 1 | 4 | 9  |  
| 1 | 5 | 10 |

Finally, has more one table. Based in the last table, since c is child of b and b child of a, this table should be converted in two columns table with 3rd column with the height of each row:

| parent | child | size |  
| 0      | 2     | 2    |  
| 0      | 3     | 2    |  
| 1      | 4     | 2    |  
| 1      | 5     | 1    |  
| 2      | 6     | 2    |  
| 3      | 7     | 1    |  
| 3      | 8     | 1    |  
| 4      | 9     | 2    |  
| 5      | 10    | 1    |

In this 3rd table, all rows are unique and map all tree using only two columns. The 3rd column is the height of each row, that's means, how many times each pair appears in second table.

BENY · Accepted Answer

IIUC using factorize

import pandas as pd 
import numpy as np

s=df.stack().sort_index(level=[1])
s=pd.Series(pd.factorize(s.astype(str)+s.index.get_level_values(level=1))[0],index=s.index).unstack()
s
Out[665]:
   a  b   c
0  0  2   6
1  0  2   6
2  0  3   7
3  0  3   8
4  1  4   9
5  1  4   9
6  1  5  10


yourdf=pd.DataFrame(np.append(s[['a','b']].values,s[['b','c']].values,axis=0),columns=['p','c']).groupby(['p','c']).size().reset_index()
yourdf
Out[681]: 
   p   c  0
0  0   2  2
1  0   3  2
2  1   4  2
3  1   5  1
4  2   6  2
5  3   7  1
6  3   8  1
7  4   9  2
8  5  10  1

Python + pandas replace column values based in other column with

Answers (2)

Related Questions