destinychoice
destinychoice

Reputation: 45

randomly select one element on every row

Suppose I have pandas dataframe with 3 columns looking like this:

     col1     col2     col3
0  banana1  banana2  banana2
1   apple1   apple2   apple3
2  monkey1  monkey2  monkey3
3  iphone1  iphone2  iphone3
4  runner1  runner2  runner3
5     pig1     pig2     pig3
6    wifi1    wifi2    wifi3
7    girl1    girl2    girl3
8     boy1     boy2     boy3
9  couple1  couple2  couple3

How can I randomly select 1 out of 3 elements on every row and append it to a new dataframe where I want it to loop N times then move on and append 1 out of 3 elements on a new row and loop it N times?

This is a little bit hard to explain so I'd explain by an example:

import pandas as pd

data = {'col1': ['banana1', 'apple1', 'monkey1', 'iphone1', 'runner1', 'pig1', 'wifi1', 'girl1', 'boy1', 'couple1'],
        'col2': ['banana2', 'apple2', 'monkey2', 'iphone2', 'runner2', 'pig2', 'wifi2', 'girl2', 'boy2', 'couple2'],
        'col3': ['banana2', 'apple3', 'monkey3', 'iphone3', 'runner3', 'pig3', 'wifi3', 'girl3', 'boy3', 'couple3']}
df = pd.DataFrame(data, columns=['col1', 'col2' , 'col3'])

So what I want to do is to randomly select either item1, item2 OR item3 for every row and append it to a new row in a new dataframe, when the 10'th item is selected I want it to start over doing this N times and then move on to a new row in the new dataframe and loop it N times. Eventually ending up with something like this (with randomness):

    1       2      3       4       5       6    7     8     9    10       11      12     13      14      15      16   17    18    19   20
    banana3 apple2 monkey1 iphone2 runner2 pig1 wifi2 girl3 boy1 couple1  banana1 apple2 monkey2 iphone3 runner3 pig3 wifi2 girl1 boy1 couple3
    ........................................................................................................................................... 
    ...........................................................................................................................................
    ...........................................................................................................................................
    banana1 apple2 monkey2 iphone3 runner1 pig2 wifi3 girl1 boy3 couple2  banana2 apple1 monkey2 iphone2 runner2 pig1 wifi2 girl3 boy1 couple2

In this output the loop I have selected 1/3 on every row looped it 2 times to N rows in the new dataframe.

My attempt:

I'd love to do it by a function which will give me the desired output based on n and N.

new_df = []

def rand_element_selection(n,N):
    for row in df.iterrows: 
        element_holder = df.sample(1)
        new_df.append(placeholder)

n and N is not defined above because im struggling to move forward..

Upvotes: 0

Views: 93

Answers (2)

EdChum
EdChum

Reputation: 393973

IIUC you can do this by calling sample on axis=1 and transpose:

In [172]:
n=3
N=2
df_list=[]
for i in range(n):
    df_list.append(pd.concat([df.sample(1, axis=1).T.reset_index(drop=True) for j in range(N)], axis=1, ignore_index=True))
pd.concat(df_list, ignore_index=True)    

Out[172]:
        0       1        2        3        4     5      6      7     8   \
0  banana2  apple3  monkey3  iphone3  runner3  pig3  wifi3  girl3  boy3   
1  banana2  apple2  monkey2  iphone2  runner2  pig2  wifi2  girl2  boy2   
2  banana2  apple2  monkey2  iphone2  runner2  pig2  wifi2  girl2  boy2   

        9        10      11       12       13       14    15     16     17  \
0  couple3  banana2  apple3  monkey3  iphone3  runner3  pig3  wifi3  girl3   
1  couple2  banana1  apple1  monkey1  iphone1  runner1  pig1  wifi1  girl1   
2  couple2  banana2  apple3  monkey3  iphone3  runner3  pig3  wifi3  girl3   

     18       19  
0  boy3  couple3  
1  boy1  couple1  
2  boy3  couple3  

Upvotes: 1

user2285236
user2285236

Reputation:

Concatenation is mainly from EdChum's answer:

n=3
N=2
df_list=[]
for i in range(n):
    df_list.append(pd.concat([df.apply(np.random.choice, axis=1) for i in range(N)], ignore_index=True))
new_df = pd.concat(df_list, axis=1, ignore_index=True).T

Upvotes: 0

Related Questions