Reputation: 43
I am attempting to compare two columns within the same data frame and let me know if the value is outside 2 standard deviations of a rolling 7 day window.
I have the following code to make a copy of the current column and take its standard deviation
ma_columns = ['Carrier_AcctPswd', 'Carrier_Activation',
'Carrier_AddRemove_Feature', 'Carrier_Billing', 'Carrier_Call_Quality',
'Carrier_Client_Specific_App', 'Carrier_DataUsage', 'Carrier_Dev_Rpl',
'Carrier_Wty_Equip', 'Content_Sync_Various', 'Dev_Hardware_Other',
'Dev_Hardware_Screen', 'Dev_Issue_Various', 'Dev_Setting_Acct',
'Dev_Setting_Battery', 'Dev_Setting_Other', 'Education', 'Email',
'General_QA', 'Make_Receive_Calls', 'Messaging', 'New_Device_Setup',
'Ntwk_Data_Connection', 'Other', 'Unknown', 'Voice', 'Voicemail',
'WiFi']
for column in ma_columns:
st_column = column + '_stdev'
original_data[st_column] = pd.DataFrame(original_data[column].rolling(window=7).std())
print(original_data)
I now want to match the respective value column to two standard deviations to its rolling column and see if the current value is outside of two standard deviations. EX: "Wifi >= 2x Wifi_stdev | Wifi <= 2x WiFi_stdev". I want to iterate over every column for each variable (about 28) and I am using np.where. Using this code below:
for column in original_data:
original_data['alert'] = np.where((original_data[original_data.columns[column]] >= 2*original_data[original_data.columns[column+'_stdev']]), 'T', 'F')
I get this error:
<ipython-input-14-82df91b172c5> in <module>
1 for column in original_data:
----> 2 original_data['alert'] = np.where(((original_data[original_data.columns[column]] >= original_data[original_data.columns[column+'_ma']]) or (original_data[original_data.columns[column]] <= original_data[original_data.columns[column+'_ma']])) , 'T', 'F')
3
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in __getitem__(self, key)
3928 if is_scalar(key):
3929 key = com.cast_scalar_indexer(key)
-> 3930 return getitem(key)
3931
3932 if isinstance(key, slice):
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
I want this to go into a new column called 'alert' as 'T' if it is outside 2 std or 'F' otherwise.
Upvotes: 0
Views: 188
Reputation: 31166
As I understand it you want to build up alert by checking each of the columns is within 2 std deviations. Your logic I believe needs to be modified as stddev in absence of mean is not statistically anything. (Normal distribution)
alert==True
as the loop progressesimport numpy as np
ma_columns = ['Carrier_AcctPswd', 'Carrier_Activation',
'Carrier_AddRemove_Feature', 'Carrier_Billing', 'Carrier_Call_Quality',
'Carrier_Client_Specific_App', 'Carrier_DataUsage', 'Carrier_Dev_Rpl',
'Carrier_Wty_Equip', 'Content_Sync_Various', 'Dev_Hardware_Other',
'Dev_Hardware_Screen', 'Dev_Issue_Various', 'Dev_Setting_Acct',
'Dev_Setting_Battery', 'Dev_Setting_Other', 'Education', 'Email',
'General_QA', 'Make_Receive_Calls', 'Messaging', 'New_Device_Setup',
'Ntwk_Data_Connection', 'Other', 'Unknown', 'Voice', 'Voicemail',
'WiFi']
ma_columns = ma_columns
df = pd.DataFrame([{c:random.randint(10, 11)*random.uniform(0.98,1.02) for c in ma_columns} for i in range(40)])
# just to keep track of what went on...
dflog = pd.DataFrame({"col":[], "alerts":[]}).astype({"alerts":"int64"})
# initialise....
df = df.assign(alert=False)
for c in ma_columns:
df = df.assign(
# calc upper and lower bound for current column
upper=lambda dfa: dfa[c].rolling(window=7).std()*2+dfa[c].rolling(window=7).mean(),
lower=lambda dfa: dfa[c].rolling(window=7).mean()-dfa[c].rolling(window=7).std()*2,
# keep previous alert, else check current column...
alert=lambda dfa: np.where(dfa["alert"], dfa["alert"], (dfa[c]>=dfa["upper"]) | (dfa[c]<=dfa["lower"])),
).drop(columns=["upper","lower"])
# not needed but shows what is going on
dflog = dflog.append({"col":c, "alerts":df.loc[:,["alert"]].sum().values[0]}, ignore_index=True)
output - what happened
col Carrier_AcctPswd Carrier_Activation Carrier_AddRemove_Feature Carrier_Billing Carrier_Call_Quality Carrier_Client_Specific_App Carrier_DataUsage Carrier_Dev_Rpl Carrier_Wty_Equip Content_Sync_Various Dev_Hardware_Other Dev_Hardware_Screen Dev_Issue_Various Dev_Setting_Acct Dev_Setting_Battery Dev_Setting_Other Education Email General_QA Make_Receive_Calls Messaging New_Device_Setup Ntwk_Data_Connection Other Unknown Voice Voicemail WiFi
alerts 1 1 1 2 2 2 2 3 3 3 3 6 7 7 7 7 7 7 7 8 8 9 10 10 10 10 11 12
output sample
alert Carrier_AcctPswd Carrier_Activation Carrier_AddRemove_Feature Carrier_Billing Carrier_Call_Quality Carrier_Client_Specific_App Carrier_DataUsage Carrier_Dev_Rpl Carrier_Wty_Equip Content_Sync_Various Dev_Hardware_Other Dev_Hardware_Screen Dev_Issue_Various Dev_Setting_Acct Dev_Setting_Battery Dev_Setting_Other Education Email General_QA Make_Receive_Calls Messaging New_Device_Setup Ntwk_Data_Connection Other Unknown Voice Voicemail WiFi
False 10.027893 11.180526 11.036345 11.053842 10.784546 10.147791 11.025660 10.058375 10.990958 10.090822 10.086956 9.896482 10.148533 10.789600 9.982659 11.109153 10.969188 10.950963 9.945559 10.072013 10.810217 10.144920 10.954820 9.963897 10.923819 11.112625 11.138334 10.850437
True 10.078542 10.176072 9.888113 10.187253 9.874345 10.192872 10.015289 10.147271 10.050714 10.880132 10.963420 10.147965 9.951384 11.183593 10.974927 9.907986 10.901859 9.975108 10.108487 10.158522 9.892944 10.957816 9.985727 10.160018 9.932426 10.129687 10.930133 10.144481
False 10.115461 11.108922 10.979528 10.867064 9.966132 11.093651 10.015981 9.840299 10.044067 10.936187 10.880957 10.153535 11.031433 9.834146 10.189683 11.024977 10.117397 10.786052 10.933146 11.203619 9.838307 10.112361 10.159005 11.187097 10.099687 9.974852 10.162924 11.142788
False 10.837826 9.882999 11.028710 10.082859 9.926954 10.151154 9.989557 10.032212 10.103652 10.073667 10.048499 9.941714 9.824670 10.033737 10.043848 9.997618 9.915864 10.057772 10.084293 10.107702 11.072084 11.205637 11.011373 9.837888 11.028028 10.192616 10.123548 9.860274
True 10.183415 10.044214 10.180521 11.017100 10.162875 10.926751 9.905115 11.150269 10.853725 10.888850 9.971950 9.952609 10.175848 11.049947 10.113041 10.926425 10.158558 11.055091 10.917608 10.921025 9.917453 9.937600 11.089962 11.006900 11.054964 11.099928 11.015369 10.094890
True 11.110304 10.067525 10.804917 10.122729 10.786908 10.087151 11.130379 10.165037 10.067856 9.813646 9.992265 11.164745 9.987881 10.848681 9.934480 9.875899 11.128687 10.076104 11.106895 11.084006 10.960849 10.834289 11.195834 10.082106 10.043681 10.837019 10.036509 10.952554
False 9.876790 9.868787 10.822574 11.155736 10.176943 11.173443 10.929224 9.942439 10.023803 10.060724 11.147590 10.981729 9.988948 10.912586 11.009330 11.144846 9.943222 10.821988 10.114765 9.817857 10.073799 10.984551 11.059509 11.013951 10.025922 10.789054 10.828943 11.091649
Upvotes: 1