Reputation: 541
I want to simulate a variable which can have values between 0 and 1. But I also want that this random variable to have 80% zeroes. Currently I ma doing the following:
data['response']=np.random.uniform(0,1,15000)#simulate response
data['response']=data['response'].apply(lambda x:0 if x<0.85 else x)
But this results in only the extreme values(0 and .8+) in the variable. I want to have 80 percent zeroes and rest 20% rows to have values between zero and one. This has to be done randomly.
Upvotes: 1
Views: 1424
Reputation: 674
Here's another one using numpy.random.shuffle
# Proportion between zeros and non-zeros
proportion = .8
n_non_zeros = 200
# Generate fake non-zero data.
# Inversion to ensure the range contains all the values between 0 and 1, except 0
non_zeros = 1 - np.random.uniform(size=[n_non_zeros])
# Append [proportion / (1 - proportion)] zeros
# to 'non_zeros' array for each non-zero
non_zeros += [0] * int(n_non_zeros * proportion / (1 - proportion))
# Shuffle data
np.random.shuffle(data)
# 'data' now contains 200 non-zeros and 800 zeros
# They are %20 and %80 of 1000
Upvotes: 2
Reputation: 523304
We could draw numbers from a uniform distribution extended to the negative side, then take max
with zero:
>>> numpy.maximum(0, numpy.random.uniform(-4, 1, 15000))
array([ 0.57310319, 0. , 0.02696571, ..., 0. ,
0. , 0. ])
>>> a = _
>>> sum(a <= 0)
12095
>>> sum(a > 0)
2905
>>> 12095 / 15000
0.8063333333333333
Here -4
is used because 4 / (4 + 1) = 80%.
Since the result is a sparse array, perhaps a SciPy sparse matrix is more appropriate.
>>> a = scipy.sparse.rand(1, 15000, 0.2)
>>> a.toarray()
array([[ 0. , 0.03971366, 0. , ..., 0. ,
0. , 0.9252341 ]])
Here 0.2 = 1 − 0.8 is the density of the array. The nonzero numbers are distributed uniformly between 0 and 1.
Upvotes: 1
Reputation: 221574
Here's one approach with np.random.choice
, which would suit here with its optional input argument replace
set as False or 0
to generate unique indices along the entire length of 15000
and then generate those random numbers with np.random.uniform
and assign.
Thus, the implementation would look something along these lines -
# Parameters
s = 15000 # Length of array
zeros_ratio = 0.8 # Ratio of zeros expected in the array
out = np.zeros(s) # Initialize output array
nonzeros_count = int(np.rint(s*(1-zeros_ratio))) # Count of nonzeros in array
# Generate unique indices where nonzeros are to be placed
idx = np.random.choice(s, nonzeros_count, replace=0)
# Generate nonzeros between 0 and 1
nonzeros_num = np.random.uniform(0,1,nonzeros_count)
# Finally asssign into those unique positions
out[idx] = nonzeros_num
Sample run results -
In [233]: np.isclose(out, 0).sum()
Out[233]: 12000
In [234]: (~np.isclose(out, 0)).sum()
Out[234]: 3000
Upvotes: 1
Reputation: 22979
Building up on your code, you can just scale x when it is larger than 0.8:
lambda x: 0 if x < 0.8 else 5 * (x - 0.8)
Upvotes: 1