Reputation: 19
I have a same value when I am using this code. What am I doing wrong in the random?
data = data[data["VN"] >= 1000]
data_T1 = data[data["TARGET"] == 1]
data_T0 = data[data["TARGET"] == 0]
data_T0_random = data_T0.loc[np.random.choice(data_T0.index, 10000)]
data = data_T1.append(data_T0_random)
print('q:', len(data.index))
rr = data.drop_duplicates()
print('qq:', len(rr.index))
Upvotes: 0
Views: 53
Reputation: 383
Change this line:
data_T0_random=data_T0.loc[np.random.choice(data_T0.index, 10000)]
to:
data_T0_random=random.sample(data_T0,10000)
More info:
random.choices(population, weights=None, *, cum_weights=None, k=1) Return a k sized list of elements chosen from the population with replacement. If the population is empty, raises IndexError.
random.sample(population, k) Return a k length list of unique elements chosen from the population sequence or set. Used for random sampling without replacement.
Upvotes: 0
Reputation: 82795
Use replace=False
Ex:
data_T0_random=data_T0.loc[np.random.choice(data_T0.index, 10000, replace=False)]
Upvotes: 1