Michael D
Michael D

Reputation: 1747

Assign value to random rows inside DataFrame

Let's say I have some subset of dataframe by condition (column a > 5, for example).

And I want to assign 0 (for example) for 70% of the above subset, and preserving all other rows as well.

Current index is not unique.

Input:

|   some_index |   a |   b |
|-------------:|----:|----:|
|            1 |   5 |   2 |
|            2 |   4 |   5 |
|            1 |   7 |   8 |
|            2 |  10 |  11 |

Output:

|   some_index |   a |   b |
|-------------:|----:|----:|
|            1 |   0 |   0 |
|            2 |   4 |   5 |
|            1 |   0 |   0 |
|            2 |  10 |  11 |

I've came up to the following solution:

import pandas as pd
from random import shuffle

df2 = pd.DataFrame(np.array([[5, 2], [4, 5], [7, 8], [10, 11] ]),
                   columns=['a', 'b'] , index = [1, 2, 1, 2])
df2.index.name = 'some_index'
print (df2)

df2.reset_index(inplace=True) #reseting index to have a unique index
ind = df2['a'] > 4 # some condition
ind_by_cond = [row_number for row_number, bool_value in zip(ind.index, ind) if bool_value]
random.shuffle(ind_by_cond) # shuffling to make choose indexes randomly


ind_by_cond = [row_number for row_number, bool_value in zip(ind.index, ind) if bool_value]
# 0.7 is the 70% of the subset, that I would like to change
upper_limit = int(len(ind_by_cond) * 0.7) 
df2.loc[ind_by_cond[:upper_limit], ['a', 'b']] = 0 
df2.set_index('some_index', inplace=True) #returning original index back

print (df2)

Is there a more simple and elegant (pythonic) solution?

P.S. the question is different from: Randomly assign values to subset of rows in pandas dataframe

Upvotes: 0

Views: 889

Answers (1)

DaveR
DaveR

Reputation: 2348

You can try something like this using pandas functions

import pandas as pd

df2 = pd.DataFrame(np.array([[5, 2], [4, 5], [7, 8], [10, 11] ]),
                   columns=['a', 'b'] , index = [1, 2, 1, 2])
df2 = df2.reset_index(drop=True)
selected = df2.loc[df2['a']>5,:]
fraction_selected = selected.sample(frac=.7)
fraction_selected[:] = 0
df2.update(fraction_selected)
print(df2)

Upvotes: 1

Related Questions