Reputation: 277
I have a pandas dataframe that looks something like this:
brand description former_price discounted_price
0 A icecream 1099.0 855.0
1 A cheese 469.0 375.0
2 B catfood 179.0 119.0
3 C NaN 699.0 399.0
4 NaN icecream 769.0 549.0
5 A icecream 769.0 669.0
I want to create a column that will assign a unique value for each brand & description combination. Note that either the brand or the description can be missing from the dataset (notified by NaN value). Also, note that if the brand and the description is the same (duplicated) I still want the unique value to be the same for the row.
The output should look like this:
product_key brand description former_price discounted_price
0 1 A icecream 1099.0 855.0
1 2 A cheese 469.0 375.0
2 3 B catfood 179.0 119.0
3 4 C NaN 699.0 399.0
4 5 NaN icecream 769.0 549.0
5 1 A icecream 769.0 669.0
The values in product_key can be anything, I just want them to be unique based on brand and description columns. Any help is immensely appreciated!
Thanks a lot!
Upvotes: 8
Views: 5674
Reputation: 6483
You could try with pd.Series.factorize
:
df.set_index(['brand','description']).index.factorize()[0]+1
Output:
0 1
1 2
2 3
3 4
4 5
5 1
So you could try this, to assign it to be the first column:
df.insert(loc=0, column='product_key', value=df.set_index(['brand','description']).index.factorize()[0]+1)
Output:
df
product_key brand description former_price discounted_price
0 1 A icecream 1099.0 855.0
1 2 A cheese 469.0 375.0
2 3 B catfood 179.0 119.0
3 4 C NaN 699.0 399.0
4 5 NaN icecream 769.0 549.0
5 1 A icecream 769.0 669.0
Upvotes: 10
Reputation: 75080
with groupby+ngroup
:
(df.fillna({'brand':'','description':''})
.groupby(['brand','description'],sort=False).ngroup()+1)
0 1
1 2
2 3
3 4
4 5
5 1
Upvotes: 4