Cranjis
Cranjis

Reputation: 1960

pandas dataframe add rows that are shuffle of values of specific columns

I have the dataframe:

df = b_150 h_200 b_250 h_300 b_350 h_400  c1  c2 q4
       1.    2.    3.     4    5.    6.   3.  4.  4

I want to add rows with possible shuffles between values of b_150, b_250, b_350 and h_200, h_300, h_400

So for example

df = add_shuffles(df, cols=[b_150, b_250, b350], n=1)
df = add_shuffles(df, cols=[h_200, h_300, h_400], n=1)

I will add 2 combinations (1 for l1 and one for l2) to get:

df = b_150 h_200 b_250 h_300 b_350 h_400   c1  c2 q4
       1.    2.    3.     4    5.    6.    3.  4.  4
       3.    2.    5.     4    1.    6.    3.  4.  4 
       1.    2.    3.     6    5.    4.    3.  4.  4

What is the most efficient way to do it?

Upvotes: 0

Views: 81

Answers (1)

user7864386
user7864386

Reputation:

Try:

def columns_shuffler():
    x, y = random.sample(list(cols), 2)
    if y:
        return random.sample(cols[0], len(cols[0])) + cols[1]
    else:
        return cols[0] + random.sample(cols[1], len(cols[1]))

msk = df.columns.str.contains('b')
msk1 = df.columns.str.contains('h')
cols = dict(enumerate([df.columns[msk].tolist(), df.columns[msk1].tolist()]))
out = pd.concat([df, pd.DataFrame(np.c_[np.r_[[df[columns_shuffler()] 
                                         for _ in range(n)]].reshape(n, -1), 
                                        np.tile(df.loc[:, ~(msk | msk1)], (n,1))], 
                                  columns=cols[0]+cols[1]+df.columns[~(msk|msk1)].tolist())])

Upvotes: 1

Related Questions