add random noise in a dataframe

Question

i have a dataframe with this kind of data :

      0    1    2    3    4    5    6    7    8    9    10   11   12   13   14   15   16   17   18   19   ...  309  310  311  312  313  314  315  316  317  318  319  320  321  322  323  324  325  326  327  328
0      18    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0  ...    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
1      84    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0  ...    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
2      50    1    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0  ...    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0

the df shape is (10000, 329)

I would like to turn random 5% of 1 in the dataframe to 0.

Is this possible?

Code Different · Accepted Answer

Try this:

# Get all columns from 1 to 328 and stack them into a temp series
tmp = df.loc[:, 1:].stack()

# Get the 1s
ones = tmp[tmp == 1].values.astype('int8')

# Mix with 5% zeros. You can use ceil or floor here
# as long as it makes an integer
n_zero = np.ceil(ones.shape[0] * .05).astype('int')

# Make the 0s
zeros = np.zeros(n_zero, dtype='int8')

# Replace 5% of the 1s with 0s and shuffle them
noise = np.concatenate((ones[n_zero:], zeros))
np.random.shuffle(noise)

# Assign the noise back to `tmp`
tmp.loc[tmp == 1] = noise

# Assign the noise back to the orignal frame
df.loc[:, 1:] = tmp.unstack()

You can tell whether 5% of 1s has been replaced with 0s by summing the before and after frames:

# Run this before and after the last line above to verify
df.loc[:, 1:].values.sum()

add random noise in a dataframe

Answers (2)

Related Questions