Reputation: 1747
Let's say I have some subset of dataframe by condition (column a > 5, for example).
And I want to assign 0 (for example) for 70% of the above subset, and preserving all other rows as well.
Current index is not unique.
Input:
| some_index | a | b |
|-------------:|----:|----:|
| 1 | 5 | 2 |
| 2 | 4 | 5 |
| 1 | 7 | 8 |
| 2 | 10 | 11 |
Output:
| some_index | a | b |
|-------------:|----:|----:|
| 1 | 0 | 0 |
| 2 | 4 | 5 |
| 1 | 0 | 0 |
| 2 | 10 | 11 |
I've came up to the following solution:
import pandas as pd
from random import shuffle
df2 = pd.DataFrame(np.array([[5, 2], [4, 5], [7, 8], [10, 11] ]),
columns=['a', 'b'] , index = [1, 2, 1, 2])
df2.index.name = 'some_index'
print (df2)
df2.reset_index(inplace=True) #reseting index to have a unique index
ind = df2['a'] > 4 # some condition
ind_by_cond = [row_number for row_number, bool_value in zip(ind.index, ind) if bool_value]
random.shuffle(ind_by_cond) # shuffling to make choose indexes randomly
ind_by_cond = [row_number for row_number, bool_value in zip(ind.index, ind) if bool_value]
# 0.7 is the 70% of the subset, that I would like to change
upper_limit = int(len(ind_by_cond) * 0.7)
df2.loc[ind_by_cond[:upper_limit], ['a', 'b']] = 0
df2.set_index('some_index', inplace=True) #returning original index back
print (df2)
Is there a more simple and elegant (pythonic) solution?
P.S. the question is different from: Randomly assign values to subset of rows in pandas dataframe
Upvotes: 0
Views: 889
Reputation: 2348
You can try something like this using pandas
functions
import pandas as pd
df2 = pd.DataFrame(np.array([[5, 2], [4, 5], [7, 8], [10, 11] ]),
columns=['a', 'b'] , index = [1, 2, 1, 2])
df2 = df2.reset_index(drop=True)
selected = df2.loc[df2['a']>5,:]
fraction_selected = selected.sample(frac=.7)
fraction_selected[:] = 0
df2.update(fraction_selected)
print(df2)
Upvotes: 1