Reputation: 503
I have data :
data = [
(1, 'Shirt', 2),
(1, 'Pants', 3),
(2, 'Top', 2),
(2, 'Shirt', 1),
(2, 'T-Shirt', 4),
(3, 'Shirt', 3),
(3, 'T-Shirt', 2),
(4, 'Top', 3),
(4, 'Pants', 3),
(4, 'T-Shirt', 3),
]
and I transform using pandas
:
df = pd.DataFrame(data, columns=['unique_id', 'category_product', 'count'])
and matrix from df
is :
unique_id category_product count
0 11 Shirt 2
1 11 Pants 3
2 24 Top 2
3 24 Shirt 1
4 24 T-Shirt 4
5 36 Shirt 3
6 36 T-Shirt 2
7 48 Top 3
8 48 Pants 3
9 48 T-Shirt 3
but I need change the unique_id
start from 0, and increase in the order seen and result like :
unique_id category_product count
0 0 Shirt 2
1 0 Pants 3
2 1 Top 2
3 1 Shirt 1
4 1 T-Shirt 4
5 2 Shirt 3
6 2 T-Shirt 2
7 3 Top 3
8 3 Pants 3
9 3 T-Shirt 3
how can I do that?
Upvotes: 0
Views: 74
Reputation: 180867
There may be simpler ways, but here's one;
df.unique_id = (df.unique_id.diff() != 0).cumsum() - 1
Basically it just compares each row to the previous one, if the diff is != 0 it increases the output value by 1. The -1 at the end is to compensate for the leading NaN (where the first row has nothing to diff against)
Upvotes: 1