Reputation: 45
col1 col2 col3
0 banana1 banana2 banana2
1 apple1 apple2 apple3
2 monkey1 monkey2 monkey3
3 iphone1 iphone2 iphone3
4 runner1 runner2 runner3
5 pig1 pig2 pig3
6 wifi1 wifi2 wifi3
7 girl1 girl2 girl3
8 boy1 boy2 boy3
9 couple1 couple2 couple3
How can I randomly select 1 out of 3 elements on every row and append it to a new dataframe where I want it to loop N times then move on and append 1 out of 3 elements on a new row and loop it N times?
import pandas as pd
data = {'col1': ['banana1', 'apple1', 'monkey1', 'iphone1', 'runner1', 'pig1', 'wifi1', 'girl1', 'boy1', 'couple1'],
'col2': ['banana2', 'apple2', 'monkey2', 'iphone2', 'runner2', 'pig2', 'wifi2', 'girl2', 'boy2', 'couple2'],
'col3': ['banana2', 'apple3', 'monkey3', 'iphone3', 'runner3', 'pig3', 'wifi3', 'girl3', 'boy3', 'couple3']}
df = pd.DataFrame(data, columns=['col1', 'col2' , 'col3'])
So what I want to do is to randomly select either item1
, item2
OR item3
for every row and append it to a new row in a new dataframe, when the 10'th item is selected I want it to start over doing this N times and then move on to a new row in the new dataframe and loop it N times. Eventually ending up with something like this (with randomness):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
banana3 apple2 monkey1 iphone2 runner2 pig1 wifi2 girl3 boy1 couple1 banana1 apple2 monkey2 iphone3 runner3 pig3 wifi2 girl1 boy1 couple3
...........................................................................................................................................
...........................................................................................................................................
...........................................................................................................................................
banana1 apple2 monkey2 iphone3 runner1 pig2 wifi3 girl1 boy3 couple2 banana2 apple1 monkey2 iphone2 runner2 pig1 wifi2 girl3 boy1 couple2
In this output the loop I have selected 1/3 on every row looped it 2 times to N rows in the new dataframe.
I'd love to do it by a function which will give me the desired output based on n and N.
new_df = []
def rand_element_selection(n,N):
for row in df.iterrows:
element_holder = df.sample(1)
new_df.append(placeholder)
n
and N
is not defined above because im struggling to move forward..
Upvotes: 0
Views: 93
Reputation: 393973
IIUC you can do this by calling sample
on axis=1
and transpose:
In [172]:
n=3
N=2
df_list=[]
for i in range(n):
df_list.append(pd.concat([df.sample(1, axis=1).T.reset_index(drop=True) for j in range(N)], axis=1, ignore_index=True))
pd.concat(df_list, ignore_index=True)
Out[172]:
0 1 2 3 4 5 6 7 8 \
0 banana2 apple3 monkey3 iphone3 runner3 pig3 wifi3 girl3 boy3
1 banana2 apple2 monkey2 iphone2 runner2 pig2 wifi2 girl2 boy2
2 banana2 apple2 monkey2 iphone2 runner2 pig2 wifi2 girl2 boy2
9 10 11 12 13 14 15 16 17 \
0 couple3 banana2 apple3 monkey3 iphone3 runner3 pig3 wifi3 girl3
1 couple2 banana1 apple1 monkey1 iphone1 runner1 pig1 wifi1 girl1
2 couple2 banana2 apple3 monkey3 iphone3 runner3 pig3 wifi3 girl3
18 19
0 boy3 couple3
1 boy1 couple1
2 boy3 couple3
Upvotes: 1
Reputation:
Concatenation is mainly from EdChum's answer:
n=3
N=2
df_list=[]
for i in range(n):
df_list.append(pd.concat([df.apply(np.random.choice, axis=1) for i in range(N)], ignore_index=True))
new_df = pd.concat(df_list, axis=1, ignore_index=True).T
Upvotes: 0