Reputation: 31
I want to randomize my DataFrame by 'pd.sample' method, but in order to achieve conditional sampling. I need the results satisfy the below condition: no same values of two consecutive rows in certain column.
e.g.: A B
a 2
b 3
c 3
d 4
I wanna the sample results like
e.g. A B
a 2
c 3
d 4
b 3
(i.e. rows where B hold same values would not be consecutive)
How can I achieve this? I already have something below, but as a newbie I am not familiar with writing loop sentence:
def compareSyllable(data):
for i in data:
(data['B'] != data['B'].shift()).any()
while compareSyllable(data) == True:
for i in range(data):
data_final=data.sample(frac=1)
Upvotes: 1
Views: 196
Reputation: 294498
compare_syllable
I'd use the values attribute to get a numpy array. The syntax is cleaner for this comparison. And IIUC, you don't want any matching values from one row to the next... in any column. So using all
on a numpy array automatically checks every value in the array.
conditional_sample
I made this so that you can actually pass whatever parameters you want to pd.DataFrame.sample
via the pd.DataFrame.pipe
. I've also parameterized the predicate on which the conditional sampling depends. Meaning, you can create other condition functions to pass to suit other purposes.
pd.DataFrame.pipe
pipe
is a DataFrame method that whose first argument is a callable. pipe
passes to that callable as its first argument, the same DataFrame that called pipe
. The rest of the arguments are passed to the callable.
def compare_syllable(data):
v = data.values
return (v[:-1] != v[1:]).all()
def conditional_sample(df, predicate, *args, **kwargs):
d = df.sample(*args, **kwargs)
while not predicate(d):
d = df.sample(*args, **kwargs)
return d
df.pipe(conditional_sample, compare_syllable, frac=1)
A B
2 c 3
0 a 2
1 b 3
3 d 4
I'll also point out that pipe
was a choice of mine and we could have also called this via a more normal way like this:
conditional_sample(
df=df,
predicate=compare_syllable,
frac=1
)
A B
2 c 3
0 a 2
3 d 4
1 b 3
To further demonstrate that we could have passed other parameters, I'll forgo the frac=1
argument and pass 3
instead which will be the number of rows to sample
df.pipe(conditional_sample, compare_syllable, 3)
A B
1 b 3
0 a 2
3 d 4
Which also satisfies the condition.
A version that includes infinite loop handling
from itertools import count, islice
def compare_syllable(data):
v = data.values
return (v[:-1] != v[1:]).all()
def conditional_sample(df, predicate, limit=None, **kwargs):
"""default limit of None is infinite limit"""
d = df.sample(**kwargs)
for i in islice(count(), limit):
if predicate(d):
return d
break
else:
d = df.sample(**kwargs)
df.pipe(conditional_sample, compare_syllable, limit=10, frac=1)
A B
2 c 3
3 d 4
1 b 3
0 a 2
Upvotes: 2