Reputation: 851
I want to select a random row during a vector operation on a DataFrame. this is what my inpDF
looks like:
string1 string2
0 abc dfe
1 ghi jkl
2 mno pqr
3 stu vwx
I'm trying to find the function getRandomRow()
here:
outDF['string1'] = inpDF['string1']
outDF['string2'] = inpDF.getRandomRow()['string2']
so that the outDF
ends up looking (for example) like this:
string1 string2
0 abc jkl
1 ghi pqr
2 mno dfe
3 stu pqr
EDIT 1:
I tried using the sample()
function as suggested in this answer, but that just causes the same sample to get replicated accross all rows:
outDF['string1'] = inpDF['string1']
outDF['string2'] = inpDF.sample(n=1).iloc[0,:]['string2']
which gives:
string1 string2
0 abc pqr
1 ghi pqr
2 mno pqr
3 stu pqr
EDIT 2:
For my particular use case, even picking the value from 'n' rows down would suffice. So, I tried doing this (I'm using inpDF.index
based on what I read in this answer):
numRows = len(inpDF)
outDF['string1'] = inpDF['string1']
outDF['string2'] = inpDF.iloc[(inpDF.index + 2)%numRows,:]['string2']
but it just ends up picking the value from the same row, and the outDF
comes out to be this:
string1 string2
0 abc dfe
1 ghi jkl
2 mno pqr
3 stu vwx
whereas I'm expecting it should be this:
string1 string2
0 abc pqr
1 ghi vwx
2 mno dfe
3 stu jkl
Upvotes: 0
Views: 126
Reputation: 42926
You use pandas.DataFrame.sample
for this:
df['string2'] = df.string2.sample(len(df.string2)).to_list()
print(df)
string1 string2
0 abc vwx
1 ghi jkl
2 mno def
3 stu pqr
Or
df['string2'] = df.string2.sample(len(df.string2)).values
Upvotes: 1
Reputation: 75120
try np.random.shuffle()
:
np.random.shuffle(df.string2)
print(df)
string1 string2
0 abc pqr
1 ghi vwx
2 mno def
3 stu jkl
If you don't want to shuffle inplace try:
df['string3']=np.random.permutation(df.string2)
print(df)
Upvotes: 1