Reputation: 358
I have this dataframe (shortened) :
+-------+------------+--------+----------+-------+------+
| index | id_product | margin | supplier | price | seen |
+-------+------------+--------+----------+-------+------+
| 0 | 100000000 | 92.00 | 14 | 0.56 | 2 |
| 1 | 100000230 | 72.21 | 27 | 8.17 | 0 |
| 2 | 100001440 | 72.07 | 15 | 16.20 | 687 |
| 3 | 100002331 | 30.55 | 13 | 41.67 | 0 |
| 7 | 100001604 | 35.17 | 27 | 18.80 | 491 |
| ... | ... | ... | ... | ... | ... |
| 9830 | 100000320 | 77.78 | 18 | 13.33 | 0 |
| 9831 | 100000321 | 77.78 | 98 | 13.33 | 0 |
| 9832 | 100000443 | 77.78 | 17 | 13.33 | 4587 |
| 9834 | 100000292 | 88.13 | 3 | 10.56 | 0 |
| 9835 | 100000236 | 72.21 | 18 | 10.56 | 0 |
+-------+------------+--------+----------+-------+------+
What I am trying to do is to extract randomly 3 rows, using df.sample(3)
maybe, but with this conditions :
the 3 rows selected should have 3 ecom_id different : (14,27,13) is good, (14,27,14) is not.
rows with higher margins should be privileged. I use weights='margin'
, it works fine.
rows with lower seen should be privileged. Is it possible to reverse the weight count with sample() to privilege lowest values ?
The 3 selected rows should be found in 3 different price slicing : first selected row should have a price < 20.0, second one should have a price between 30 and 50, and finally the third and last selected row should have a price > 80.
Is this possible ?
I have tried stuff like :
pr_1_pd = pr_pd.loc[pr_pd['price'] < 20]
pr_2_pd = pr_pd.loc[(pr_pd['price'] > 30) & (pr_pd['price'] < 50)]
pr_3_pd = pr_pd.loc[pr_pd['price'] > 80]
pr_1_pd = pr_1_pd.sort_values(by=['margin','seen'],ascending=[False,True])
pr_2_pd = pr_2_pd.sort_values(by=['margin','seen'],ascending=[False,True])
pr_3_pd = pr_3_pd.sort_values(by=['margin','seen'],ascending=[False,True])
But I'm not sure how to combine all the filters together
Upvotes: 0
Views: 895
Reputation: 360
- the 3 rows selected should have 3 ecom_id different : (14,27,13) is good, (14,27,14) is not.
Setting replace=False
in pd.sample
should achieve this if ecom_id
is unique.
- rows with lower seen should be privileged. Is it possible to reverse the weight count with sample() to privilege lowest values ?
You could invert the weights new_weight = 1 / seen
to achieve this.
- The 3 selected rows should be found in 3 different price slicing : first selected row should have a price < 20.0, second one should have a price between 30 and 50, and finally the third and last selected row should have a price > 80.
You'll have to sample from pr_1_pd
, pr_2_pd
, and pr_3_pd
individually and then combine the results using pd.concat
to achieve this.
Upvotes: 2