Reputation: 4490
I am trying to use numpy where
in conjunction with applymap
in pandas.
Sample DF:
f = [[1,5],[20,40],[100,21],[15,19],[-46,101]]
test = pd.DataFrame(f,columns=["A","B"])
test
OP:
A B
0 1 5
1 20 40
2 100 21
3 15 19
4 -46 101
Condition is, if a column value is greater than 50 or less than 25 it should be changed to 0 or it should remain as it is.
Code:
test = test.applymap(lambda x:np.where((test[x]>50)| (test[x]<25), 0,test[x]) )
test
Error:
KeyError Traceback (most recent call last)
~\AppData\Local\Continuum\miniconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: (1, 'occurred at index A')
Any suggestions will be helpful
Upvotes: 0
Views: 508
Reputation: 4490
sample_df = pd.DataFrame(np.random.randint(1,20,size=(10, 2)), columns=list('BC'))
sample_df["date"]= ["2020-02-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01",
"2020-02-02","2020-02-02","2020-02-02","2020-02-02","2020-02-02"]
sample_df["date"] = pd.to_datetime(sample_df["date"])
sample_df.set_index(sample_df["date"],inplace=True)
sample_df["A"]=[10,10,10,10,10,12,1,3,4,2]
del sample_df["date"]
sample_df
def func(df,n_bins):
try:
proc_col = pd.qcut(df["A"].values, n_bins, labels=range(0,n_bins))
return proc_col
except:
proc_col = pd.qcut(df.mean(axis =1).values, n_bins, labels=range(0,n_bins))
return proc_col
sample_df["A"]=sample_df.groupby([sample_df.index.get_level_values(0)])[["C","A"]].apply(lambda df: func(df,3))
sample_df
B C A
date
2020-02-01 1 16 [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01 5 19 [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01 2 16 [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01 12 11 [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-01 15 10 [1, 2, 1, 0, 0] Categories (3, int64): [0 < 1 ...
2020-02-02 19 17 [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02 17 7 [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02 14 1 [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02 19 13 [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 ...
2020-02-02 15 13 [2, 0, 1, 2, 0] Categories (3, int64): [0 < 1 .
Upvotes: 0
Reputation: 30930
Use DataFrame.mask
:
test.mask(test.lt(25)|test.gt(50),0)
test.where(test.ge(25) & test.le(50),0)
Output
A B
0 0 0
1 0 40
2 0 0
3 0 0
4 0 0
Using DataFrame.applymap
we could do:
test.applymap(lambda x: 0 if (x>50) or (x<25) else x)
but this could become slow for large data frames
Solution with np.where
import numpy as np
pd.DataFrame(np.where((test<25)|(test>50),0,test),index = test.index,columns = test.columns)
EDIT
mean_test = test.mean()
limit = 5
df_filtered = test.mask(test.gt(mean_test.add(limit))|
test.lt(mean_test.sub(limit)),0)
print(df_filtered)
A B
0 0 0
1 20 40
2 0 0
3 15 0
4 0 0
Upvotes: 3