Reputation: 1960
I have the dataframe:
df = b_150 h_200 b_250 h_300 b_350 h_400 c1 c2 q4
1. 2. 3. 4 5. 6. 3. 4. 4
I want to add rows with possible shuffles between values of b_150, b_250, b_350 and h_200, h_300, h_400
So for example
df = add_shuffles(df, cols=[b_150, b_250, b350], n=1)
df = add_shuffles(df, cols=[h_200, h_300, h_400], n=1)
I will add 2 combinations (1 for l1 and one for l2) to get:
df = b_150 h_200 b_250 h_300 b_350 h_400 c1 c2 q4
1. 2. 3. 4 5. 6. 3. 4. 4
3. 2. 5. 4 1. 6. 3. 4. 4
1. 2. 3. 6 5. 4. 3. 4. 4
What is the most efficient way to do it?
Upvotes: 0
Views: 81
Reputation:
Try:
def columns_shuffler():
x, y = random.sample(list(cols), 2)
if y:
return random.sample(cols[0], len(cols[0])) + cols[1]
else:
return cols[0] + random.sample(cols[1], len(cols[1]))
msk = df.columns.str.contains('b')
msk1 = df.columns.str.contains('h')
cols = dict(enumerate([df.columns[msk].tolist(), df.columns[msk1].tolist()]))
out = pd.concat([df, pd.DataFrame(np.c_[np.r_[[df[columns_shuffler()]
for _ in range(n)]].reshape(n, -1),
np.tile(df.loc[:, ~(msk | msk1)], (n,1))],
columns=cols[0]+cols[1]+df.columns[~(msk|msk1)].tolist())])
Upvotes: 1