user2906657
user2906657

Reputation: 541

Randomly generate more proportion of zeroes python

I want to simulate a variable which can have values between 0 and 1. But I also want that this random variable to have 80% zeroes. Currently I ma doing the following:

data['response']=np.random.uniform(0,1,15000)#simulate response
data['response']=data['response'].apply(lambda x:0 if x<0.85 else x)

But this results in only the extreme values(0 and .8+) in the variable. I want to have 80 percent zeroes and rest 20% rows to have values between zero and one. This has to be done randomly.

Upvotes: 1

Views: 1424

Answers (4)

Garmekain
Garmekain

Reputation: 674

Here's another one using numpy.random.shuffle

# Proportion between zeros and non-zeros
proportion = .8
n_non_zeros = 200

# Generate fake non-zero data.
# Inversion to ensure the range contains all the values between 0 and 1, except 0
non_zeros = 1 - np.random.uniform(size=[n_non_zeros])

# Append [proportion / (1 - proportion)] zeros
# to 'non_zeros' array for each non-zero
non_zeros += [0] * int(n_non_zeros * proportion / (1 - proportion))

# Shuffle data
np.random.shuffle(data)

# 'data' now contains 200 non-zeros and 800 zeros
# They are %20 and %80 of 1000

Upvotes: 2

kennytm
kennytm

Reputation: 523304

We could draw numbers from a uniform distribution extended to the negative side, then take max with zero:

>>> numpy.maximum(0, numpy.random.uniform(-4, 1, 15000))
array([ 0.57310319,  0.        ,  0.02696571, ...,  0.        ,
        0.        ,  0.        ])
>>> a = _
>>> sum(a <= 0)
12095
>>> sum(a > 0)
2905
>>> 12095 / 15000
0.8063333333333333

Here -4 is used because 4 / (4 + 1) = 80%.


Since the result is a sparse array, perhaps a SciPy sparse matrix is more appropriate.

>>> a = scipy.sparse.rand(1, 15000, 0.2)
>>> a.toarray()
array([[ 0.        ,  0.03971366,  0.        , ...,  0.        ,
         0.        ,  0.9252341 ]])

Here 0.2 = 1 − 0.8 is the density of the array. The nonzero numbers are distributed uniformly between 0 and 1.

Upvotes: 1

Divakar
Divakar

Reputation: 221574

Here's one approach with np.random.choice, which would suit here with its optional input argument replace set as False or 0 to generate unique indices along the entire length of 15000 and then generate those random numbers with np.random.uniform and assign.

Thus, the implementation would look something along these lines -

# Parameters
s = 15000 # Length of array
zeros_ratio = 0.8 # Ratio of zeros expected in the array

out = np.zeros(s) # Initialize output array
nonzeros_count = int(np.rint(s*(1-zeros_ratio))) # Count of nonzeros in array

# Generate unique indices where nonzeros are to be placed
idx = np.random.choice(s, nonzeros_count, replace=0)

# Generate nonzeros between 0 and 1
nonzeros_num = np.random.uniform(0,1,nonzeros_count)

# Finally asssign into those unique positions
out[idx] = nonzeros_num

Sample run results -

In [233]: np.isclose(out, 0).sum()
Out[233]: 12000

In [234]: (~np.isclose(out, 0)).sum()
Out[234]: 3000

Upvotes: 1

BlackBear
BlackBear

Reputation: 22979

Building up on your code, you can just scale x when it is larger than 0.8:

lambda x: 0 if x < 0.8 else 5 * (x - 0.8)

Upvotes: 1

Related Questions