Reputation: 1111
I have csv with 2 columns: "Context", "Utterance".
I need to shuffle (make random order) "Context" column values. Note, that not full row to shuffle, but only 1 column, second column "Utterance" order remains the same.
For this i used: answers (shuffling/permutating a DataFrame in pandas)
train_df2 = pd.read_csv("./data/nolabel.csv", encoding='utf-8', sep=",")
train_df2.drop('Utterance', axis=1, inplace=True) # delete 'Utterance'
train_df2 = train_df2.sample(frac=1) # shuffle
train_df2['Utterance'] = train_moscow_df['Utterance'] # add back 'Utterance'
train_df2["Label"] = 0
header = ["Context", "Utterance", "Label"] #
train_df2.to_csv('./data/label0.csv', columns = header, encoding='utf-8', index = False)
BUT, result is bad: i got a full rows shuffle, but corresponding values from 2 columns still the same.
I need that 1st value from 1st column correspond to random value from 2nd. (Also tried from sklearn.utils import shuffle
but no luck too)
Upvotes: 3
Views: 7635
Reputation: 394061
the problem is that when the df is shuffled the index is shuffled but then you add the original column back and it will align on the original index, you can call reset_index
so that it doesn't do this:
train_df2 = train_df2.sample(frac=1) # shuffle
train_df2.reset_index(inplace=True, drop=True)
train_df2['Utterance'] = train_moscow_df['Utterance'] # add back 'Utterance'
Example:
In [196]:
# setup
df = pd.DataFrame(np.random.randn(5,2), columns=list('ab'))
df
Out[196]:
a b
0 0.116596 -0.684748
1 -0.133922 -0.969933
2 0.103551 0.912101
3 -0.279751 -0.348443
4 1.453413 0.062378
now we drop and shuffle as before, note the index values
In [197]:
a = df.drop('b', axis=1)
a = a.sample(frac=1)
a
Out[197]:
a
3 -0.279751
0 0.116596
1 -0.133922
4 1.453413
2 0.103551
now reset
In [198]:
a.reset_index(inplace=True, drop=True)
a
Out[198]:
a
0 -0.279751
1 0.116596
2 -0.133922
3 1.453413
4 0.103551
we can add the column back but retain shuffled order:
In [199]:
df['b'] = a['b']
df
Out[199]:
a b
0 -0.279751 -0.684748
1 0.116596 -0.969933
2 -0.133922 0.912101
3 1.453413 -0.348443
4 0.103551 0.062378
Upvotes: 4