Reputation: 631
I have a pandas data frame and I want to set some percentage of a column to 0. Let's say the df has two columns.
A B
1 6
2 7
3 8
4 4
5 9
I now want to set B for the first and last 20 % of the df to 0.
A B
1 0
2 7
3 8
4 4
5 0
Upvotes: 2
Views: 413
Reputation: 863196
Use numpy.r_
for join first and last positions and then change values by iloc
, for position of column B
use Index.get_loc
:
N = .2
total = len(df.index)
#convert to int for always integer
i = int(total * N)
idx = np.r_[0:i, total-i:total]
df.iloc[idx, df.columns.get_loc('B')] = 0
Or:
N = .2
total = len(df.index)
i = int(total * N)
pos = df.columns.get_loc('B')
df.iloc[:i, pos] = 0
df.iloc[total - i:, pos] = 0
print (df)
A B
0 1 0
1 2 7
2 3 8
3 4 4
4 5 0
EDIT:
If Sparsedataframe
and same type of values is possible convert to numpy array, set value and convert back:
arr = df.values
N = .2
total = len(df.index)
i = int(total * N)
pos = df.columns.get_loc('B')
idx = np.r_[0:i, total-i:total]
arr[idx, pos] = 0
print (arr)
[[1 0]
[2 7]
[3 8]
[4 4]
[5 0]]
df = pd.SparseDataFrame(arr, columns=df.columns)
print (df)
A B
0 1 0
1 2 7
2 3 8
3 4 4
4 5 0
print (type(df))
<class 'pandas.core.sparse.frame.SparseDataFrame'>
EDIT1:
Another solution is first convert to dense and then convert back:
df = df.to_dense()
#apply solution
df = df.to_sparse()
Upvotes: 1
Reputation: 12417
You can do so:
x = 20 # percentage of the first and last rows
y = float(len(df.index))
z = int(round(y/100 *x))
h = int(y-z)
df['B'][:z]=0
df['B'][h:]=0
Upvotes: 0
Reputation: 19957
You can do:
num_rows = round(len(df)*0.2)
df.loc[(df.index<num_rows) | (df.index[::-1]<num_rows), 'B'] = 0
df
Out[89]:
A B
0 1 0
1 2 7
2 3 8
3 4 4
4 5 0
Upvotes: 0