Reputation: 631

Set percentage of column to 0 (pandas)

I have a pandas data frame and I want to set some percentage of a column to 0. Let's say the df has two columns.

I now want to set B for the first and last 20 % of the df to 0.

Upvotes: 2

Answers (3)

jezrael

Reputation: 863196

Use numpy.r_ for join first and last positions and then change values by iloc, for position of column B use Index.get_loc:

N = .2
total = len(df.index)
#convert to int for always integer
i = int(total * N)
idx = np.r_[0:i, total-i:total]
df.iloc[idx, df.columns.get_loc('B')] = 0

Or:

N = .2
total = len(df.index)
i = int(total * N)
pos = df.columns.get_loc('B')

df.iloc[:i, pos] = 0
df.iloc[total - i:, pos] = 0

EDIT:

If Sparsedataframe and same type of values is possible convert to numpy array, set value and convert back:

arr = df.values
N = .2
total = len(df.index)
i = int(total * N)
pos = df.columns.get_loc('B')
idx = np.r_[0:i, total-i:total]

arr[idx, pos] = 0
print (arr)
[[1 0]
 [2 7]
 [3 8]
 [4 4]
 [5 0]]

df = pd.SparseDataFrame(arr, columns=df.columns)
print (df)
   A  B
0  1  0
1  2  7
2  3  8
3  4  4
4  5  0

print (type(df))
<class 'pandas.core.sparse.frame.SparseDataFrame'>

EDIT1:

Another solution is first convert to dense and then convert back:

df = df.to_dense()
#apply solution
df = df.to_sparse()

Upvotes: 1

Joe

Reputation: 12417

You can do so:

x = 20  # percentage of the first and last rows
y = float(len(df.index))
z = int(round(y/100 *x))
h = int(y-z)
df['B'][:z]=0
df['B'][h:]=0

Upvotes: 0

Allen Qin

Reputation: 19957

You can do:

num_rows = round(len(df)*0.2)

df.loc[(df.index<num_rows) | (df.index[::-1]<num_rows), 'B'] = 0

df
Out[89]: 
   A  B
0  1  0
1  2  7
2  3  8
3  4  4
4  5  0

Upvotes: 0

Set percentage of column to 0 (pandas)

Answers (3)

Related Questions