Reputation: 361
I would like to store all "coordinates" (column positions and row positions), from all Dataframe entries which fulfill a certain condition. In my case, if the values are greater than 0.8.
Here is my code:
import numpy as np
import pandas as pd
randValues = np.random.rand(5,5)
df = pd.DataFrame(randValues)
df_bool = df > 0.8
colArray = np.empty([])
rowArray = np.empty([])
for dfIdx, dfCol in enumerate(df_bool):
row = dfCol.loc[dfCol['1'] == True]
if ~row.isempty():
colArray.append(dfIdx)
rowArray.append(row)
Upvotes: 3
Views: 1721
Reputation: 13401
Use np.where
with np.column_stack
:
randValues = np.random.rand(5,5)
df = pd.DataFrame(randValues)
df_bool = df > 0.8
ind = np.column_stack(np.where(df_bool)))
print(ind)
colArray = [i[1] for i in ind] # [2,3]
rowArray = [i[0] for i in ind] # [0,1]
Output:
array([0,2],
[1,3])
Upvotes: 0
Reputation: 4792
You can try np.where and zip
randValues = np.random.rand(5,5)
df = pd.DataFrame(randValues)
df_bool = df > 0.8
df_bool
0 1 2 3 4
0 False False False False False
1 False True False False False
2 False False True False False
3 False False False False False
4 True False False False False
np.where will return indices where the condition is satisfied with row indices in first array and column indices in seconds
arr = np.where(df_bool)
arr
(array([1, 2, 4], dtype=int64), array([1, 2, 0], dtype=int64))
list(zip(arr[0], arr[1]))
[(1, 1), (2, 2), (4, 0)]
rowArray = arr[0]
colArray = arr[1]
Upvotes: 0
Reputation: 862641
Use numpy.where
for positions and then select by indexing if not default index/columns values:
np.random.seed(2019)
randValues = np.random.rand(5,5)
df = pd.DataFrame(randValues, columns=list('abcde'))
print (df)
a b c d e
0 0.903482 0.393081 0.623970 0.637877 0.880499
1 0.299172 0.702198 0.903206 0.881382 0.405750
2 0.452447 0.267070 0.162865 0.889215 0.148476
3 0.984723 0.032361 0.515351 0.201129 0.886011
4 0.513620 0.578302 0.299283 0.837197 0.526650
r, c = np.where(df > 0.8)
print (r)
[0 0 1 1 2 3 3 4]
print (c)
[0 4 2 3 3 0 4 3]
colArray = df.columns.values[c]
print (colArray)
['a' 'e' 'c' 'd' 'd' 'a' 'e' 'd']
rowArray = df.index.values[c]
print (rowArray)
[0 4 2 3 3 0 4 3]
Upvotes: 1