Reputation: 4807
I am following the answer from the link:
If I have a dataframe df as:
Month Day mnthShape
1 1 1.01
1 1 1.09
1 1 0.96
1 2 1.01
1 1 1.09
1 2 0.96
1 3 1.01
1 3 1.09
1 3 1.78
I want to get the following from df
:
Month Day mnthShape
1 1 1.01
1 2 1.01
1 1 0.96
where the mnthShape
values are selected at random from the index without replacement
. i.e. if the query is df.loc[(1, 1)]
it should look for all values for (1, 1) and select randomly from it a value to be displayed above. If another df.loc[(1,1)]
appears it should select randomly again but without replacement.
I know I need to modify the code to use the following:
apply(np.random.choice, replace=False)
But not sure how to do it.
Edit:
Everytime I do df.loc[(1, 1)]
, it should give new value without replacement. I intend to do df.loc[(1, 1)]
multiple times. In the previous question, it was just one time.
Upvotes: 0
Views: 959
Reputation: 15432
If you're trying to sample from the dataset without replacement, it probably makes sense to do this all in one go, rather than iteratively pulling a sample from the dataset.
Pulling N samples from each month/day combo requires that there be sufficient combinations to pull N without replacement. But assuming this is true, you could write a function to sample N values from a subset of the data:
def select_n(subset, n=2):
choices = np.random.choice(len(x), size=n, replace=False)
return (
subset
.mnthShape
.iloc[choices]
.reset_index(drop=True)
.rename_axis('choice'))
to apply this across the whole dataset:
In [34]: df.groupby(['Month', 'Day']).apply(select_n)
Out[34]:
choice 0 1
Month Day
1 1 1.09 0.96
2 0.96 1.01
3 1.09 1.01
If you really need to pull these one at a time, you'll still need to generate the samples all at once to guarantee that they're drawn without replacement, but you could generate the sample indices separately from subsetting the data:
In [48]: indices = np.random.choice(3, size=2, replace=False)
In [49]: df[((df.Month == 1) & (df.Day == 2))].iloc[indices[0]]
Out[49]:
Month 1.00
Day 2.00
mnthShape 1.01
Name: 3, dtype: float64
In [50]: df[((df.Month == 1) & (df.Day == 2))].iloc[indices[1]]
Out[50]:
Month 1.00
Day 2.00
mnthShape 0.96
Name: 5, dtype: float64
Upvotes: 2