Reputation: 6176
I have a dataframe that contains the numpy object
column.The data is as follows:
data
0 [1, 2, 2, 3, 4, 2]
1 [2, 4, 2, 5, 2, 3, 2]
2 [2, 2, 2, 8, 2, 3, 2, 9, 1]
...
I would like to get the index of every numpy in the column to satisfy the condition: (>(mean+std))or(<(mean-std))
,the output I expect is as follows:
data index
0 [1, 2, 2, 3, 4, 2] [0,4]
1 [2, 4, 2, 5, 2, 3, 2] [1,3]
2 [2, 2, 2, 8, 2, 3, 2, 9, 1] [3,7]
...
My code is like this:
df['index'] = df['data'].map(lambda x: np.where(((x > x.mean() + x.std()) or (x < x.mean() - x.std())))[0])
But it has a mistake
:
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
When I'm using only half of the condition(such as (>(mean+std))
), there's no problem, so I guess my expression is wrong, but I don't know how to change it.
Can someone help me? Thanks in advance
Upvotes: 1
Views: 143
Reputation: 863166
I think you need np.logical_or and reduce
:
df['index'] = df['data'].map(lambda x: np.where(np.logical_or
.reduce(((x > x.mean() + x.std()),
(x < x.mean() - x.std()))))[0])
print (df)
data index
0 [1, 2, 2, 3, 4, 2] [0, 4]
1 [2, 4, 2, 5, 2, 3, 2] [1, 3]
2 [2, 2, 2, 8, 2, 3, 2, 9, 1] [3, 7]
For verify solution:
df['index'] = df['data'].map(lambda x: ((x > x.mean() + x.std())))
df['index1'] = df['data'].map(lambda x: ((x < x.mean() - x.std())))
#https://stackoverflow.com/a/33375383/2901002
with pd.option_context('display.max_colwidth', 200):
print (df)
data \
0 [1, 2, 2, 3, 4, 2]
1 [2, 4, 2, 5, 2, 3, 2]
2 [2, 2, 2, 8, 2, 3, 2, 9, 1]
index \
0 [False, False, False, False, True, False]
1 [False, True, False, True, False, False, False]
2 [False, False, False, True, False, False, False, True, False]
index1
0 [True, False, False, False, False, False]
1 [False, False, False, False, False, False, False]
2 [False, False, False, False, False, False, False, False, False]
Upvotes: 1