Luk-StackOverflow
Luk-StackOverflow

Reputation: 361

How to extract column and row index from a Data Frame that satisfies a condition

I would like to store all "coordinates" (column positions and row positions), from all Dataframe entries which fulfill a certain condition. In my case, if the values are greater than 0.8.

Here is my code:

import numpy as np
import pandas as pd


randValues = np.random.rand(5,5)

df = pd.DataFrame(randValues)
df_bool = df > 0.8


colArray = np.empty([])
rowArray = np.empty([])


for dfIdx, dfCol in enumerate(df_bool):
    row = dfCol.loc[dfCol['1'] == True]

    if ~row.isempty():
        colArray.append(dfIdx)
        rowArray.append(row)

Upvotes: 3

Views: 1721

Answers (3)

Sociopath
Sociopath

Reputation: 13401

Use np.where with np.column_stack:

randValues = np.random.rand(5,5)

df = pd.DataFrame(randValues)
df_bool = df > 0.8

ind = np.column_stack(np.where(df_bool)))
print(ind)
colArray = [i[1] for i in ind]    # [2,3]
rowArray = [i[0] for i in ind]    # [0,1]

Output:

array([0,2],
      [1,3])

Upvotes: 0

Mohit Motwani
Mohit Motwani

Reputation: 4792

You can try np.where and zip

randValues = np.random.rand(5,5)

df = pd.DataFrame(randValues)
df_bool = df > 0.8
df_bool

     0      1        2       3       4
0   False   False   False   False   False
1   False   True    False   False   False
2   False   False   True    False   False
3   False   False   False   False   False
4   True    False   False   False   False

np.where will return indices where the condition is satisfied with row indices in first array and column indices in seconds

arr = np.where(df_bool)
arr


(array([1, 2, 4], dtype=int64), array([1, 2, 0], dtype=int64))

list(zip(arr[0], arr[1]))
[(1, 1), (2, 2), (4, 0)]

rowArray = arr[0]
colArray = arr[1]

Upvotes: 0

jezrael
jezrael

Reputation: 862641

Use numpy.where for positions and then select by indexing if not default index/columns values:

np.random.seed(2019)
randValues = np.random.rand(5,5)

df = pd.DataFrame(randValues, columns=list('abcde'))
print (df)
          a         b         c         d         e
0  0.903482  0.393081  0.623970  0.637877  0.880499
1  0.299172  0.702198  0.903206  0.881382  0.405750
2  0.452447  0.267070  0.162865  0.889215  0.148476
3  0.984723  0.032361  0.515351  0.201129  0.886011
4  0.513620  0.578302  0.299283  0.837197  0.526650

r, c = np.where(df > 0.8)
print (r)
[0 0 1 1 2 3 3 4]
print (c)
[0 4 2 3 3 0 4 3]

colArray = df.columns.values[c]
print (colArray)
['a' 'e' 'c' 'd' 'd' 'a' 'e' 'd']

rowArray = df.index.values[c]
print (rowArray)
[0 4 2 3 3 0 4 3]

Upvotes: 1

Related Questions