Mark McGown
Mark McGown

Reputation: 1115

How to get row,column list of tuples from DataFrame?

Trying to get list of row,column tuples that meet some criteria from a df.

I referenced this posting: Get column and row index pairs of Pandas DataFrame matching some criteria

A = pd.DataFrame([(1.0,0.8,0.6708203932499369,0.6761234037828132,0.7302967433402214),
                  (0.8,1.0,0.6708203932499369,0.8451542547285166,0.9128709291752769),
        (0.6708203932499369,0.6708203932499369,1.0,0.5669467095138409,0.6123724356957946),
        (0.6761234037828132,0.8451542547285166,0.5669467095138409,1.0,0.9258200997725514),
        (0.7302967433402214,0.9128709291752769,0.6123724356957946,0.9258200997725514,1.0)
                  ])

c2 = A.copy()
c2.values[np.tril_indices_from(c2)] = np.nan
[(c2.index[i], c2.columns[j]) for i, j in np.argwhere(c2 > 0.8)]
Shape of passed values is (2, 3), indices imply (5, 5)

What am I doing incorrectly?

Upvotes: 1

Views: 1137

Answers (3)

Carles
Carles

Reputation: 2829

I will use np.column_stack(np.where(condition)) to make the trick:

import pandas as pd 
import numpy as np

A = pd.DataFrame([(1.0,0.8,0.6708203932499369,0.6761234037828132,0.7302967433402214),
                  (0.8,1.0,0.6708203932499369,0.8451542547285166,0.9128709291752769),
        (0.6708203932499369,0.6708203932499369,1.0,0.5669467095138409,0.6123724356957946),
        (0.6761234037828132,0.8451542547285166,0.5669467095138409,1.0,0.9258200997725514),
        (0.7302967433402214,0.9128709291752769,0.6123724356957946,0.9258200997725514,1.0)
                  ])

c2 = A.copy()
c2.values[np.tril_indices_from(c2)] = np.nan

np.column_stack(np.where(c2>0.8))
Out[4]: 
array([[1, 3],
       [1, 4],
       [3, 4]], dtype=int64)

Upvotes: 2

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10970

You may wanna use the numpy array and not the dataframe itself i.e. c2.values

[(c2.index[i], c2.columns[j]) for i, j in np.argwhere(c2.values > 0.8)]

Upvotes: 2

ALollz
ALollz

Reputation: 59579

You can mask the DataFrame then stack leaves you with the MultiIndex tuples of (index, column) that satisfied the condition.

m = A.gt(0.8) & np.triu(np.ones(A.shape), k=1).astype('bool')
A[m].stack().index.tolist()
#[(1, 3), (1, 4), (3, 4)]

Upvotes: 1

Related Questions