IndiaSke
IndiaSke

Reputation: 358

Pandas sample() with conditions

I have this dataframe (shortened) :

+-------+------------+--------+----------+-------+------+
| index | id_product | margin | supplier | price | seen |
+-------+------------+--------+----------+-------+------+
| 0     | 100000000  | 92.00  | 14       | 0.56  | 2    |
| 1     | 100000230  | 72.21  | 27       | 8.17  | 0    |
| 2     | 100001440  | 72.07  | 15       | 16.20 | 687  |
| 3     | 100002331  | 30.55  | 13       | 41.67 | 0    |
| 7     | 100001604  | 35.17  | 27       | 18.80 | 491  |
| ...   | ...        | ...    | ...      | ...   | ...  |
| 9830  | 100000320  | 77.78  | 18       | 13.33 | 0    |
| 9831  | 100000321  | 77.78  | 98       | 13.33 | 0    |
| 9832  | 100000443  | 77.78  | 17       | 13.33 | 4587 |
| 9834  | 100000292  | 88.13  | 3        | 10.56 | 0    |
| 9835  | 100000236  | 72.21  | 18       | 10.56 | 0    |
+-------+------------+--------+----------+-------+------+

What I am trying to do is to extract randomly 3 rows, using df.sample(3) maybe, but with this conditions :

Is this possible ?

I have tried stuff like :

pr_1_pd = pr_pd.loc[pr_pd['price'] < 20]
pr_2_pd = pr_pd.loc[(pr_pd['price'] > 30) & (pr_pd['price'] < 50)]
pr_3_pd = pr_pd.loc[pr_pd['price'] > 80]

pr_1_pd = pr_1_pd.sort_values(by=['margin','seen'],ascending=[False,True])
pr_2_pd = pr_2_pd.sort_values(by=['margin','seen'],ascending=[False,True])
pr_3_pd = pr_3_pd.sort_values(by=['margin','seen'],ascending=[False,True])

But I'm not sure how to combine all the filters together

Upvotes: 0

Views: 895

Answers (1)

yoskovia
yoskovia

Reputation: 360

  • the 3 rows selected should have 3 ecom_id different : (14,27,13) is good, (14,27,14) is not.

Setting replace=False in pd.sample should achieve this if ecom_id is unique.

  • rows with lower seen should be privileged. Is it possible to reverse the weight count with sample() to privilege lowest values ?

You could invert the weights new_weight = 1 / seen to achieve this.

  • The 3 selected rows should be found in 3 different price slicing : first selected row should have a price < 20.0, second one should have a price between 30 and 50, and finally the third and last selected row should have a price > 80.

You'll have to sample from pr_1_pd, pr_2_pd, and pr_3_pd individually and then combine the results using pd.concat to achieve this.

Upvotes: 2

Related Questions