MattR
MattR

Reputation: 5126

Get row and column number from Pandas Condition

I would like to get the row and column number based on a given condition. The "coordinates" if you will.

import re
import pandas as pd
import numpy as np
dfp = pd.DataFrame({'A' : [1,21,8,44,np.NaN,6,75,8,44,999], 
                    'B' : [1,1,3,5,0,0,np.NaN,9,0,0], 
                    'C' : ['AA1233445','AA1233445', 'rmacy','Idaho Rx','Ab123455','TV192837','RX','Ohio Drugs','RX12345','USA Pharma'], 
                    'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
                    'E' : ['Assign','Assign','Hello','Ugly','Appreciate','Undo','Testing','Unicycle','Pharma','Unicorn',]})
print(dfp)

       A    B           C            D           E
0    1.0  1.0   AA1233445     123456.0      Assign
1   21.0  1.0   AA1233445     123456.0      Assign
2    8.0  3.0       rmacy    1234567.0       Hello
3   44.0  5.0    Idaho Rx   12345678.0        Ugly
4    NaN  0.0    Ab123455      12345.0  Appreciate
5    6.0  0.0    TV192837      12345.0        Undo
6   75.0  NaN          RX   12345678.0     Testing
7    8.0  9.0  Ohio Drugs  123456789.0    Unicycle
8   44.0  0.0     RX12345    1234567.0      Pharma
9  999.0  0.0  USA Pharma          NaN     Unicorn

I can get my output by doing:

print(dfp.loc[dfp['B'].isnull()].index.values[0] + 1 ,
',', + int([i for i,x in enumerate(dfp.columns.tolist()) if x == 'B'][0] + 1))

but the issue is if B had multiple nulls. I would like the coordinates for all nulls.

Is there a way to do this using dataframe.loc or something similar? Adding a 1 to the values is not a big deal, I can do that easily later on.

Upvotes: 1

Views: 1678

Answers (2)

user2314737
user2314737

Reputation: 29307

You could use

dfp[pd.isnull(dfp['B'])].index.tolist()

For adding 1concisely you could use:

np.asarray(dfp[pd.isnull(dfp['B'])].index) + 1

print

print(np.asarray(dfp[pd.isnull(dfp['B'])].index) + 1)

To include the index of column B (dfp.columns.get_loc("B") + 1):

for x in np.asarray(dfp[pd.isnull(dfp['B'])].index) + 1:
    print(str(x)+','+str(dfp.columns.get_loc("B") + 1))

To find "NaN"s in a given list of columns:

def find_NaN(list_col):
    for c in list_col:
        if c in dfp.columns:
            for x in np.asarray(dfp[pd.isnull(dfp[c])].index) + 1:
                print(str(x)+','+str(dfp.columns.get_loc(c) + 1))

find_NaN(["A","B"])
5,1
7,2

Some explanations

dfp[pd.isnull(dfp['B'])] selects data from the dataframe using an array of boolean values.

dfp.columns.get_loc(c) gives the index of column c

def find_NaN(list_col):
    for c in list_col:
        # if column is one of the dataframe's columns
        if c in dfp.columns:
            # for each index x where column c of the dataframe is null
            for x in np.asarray(dfp[pd.isnull(dfp[c])].index) + 1:
                print(str(x)+','+str(dfp.columns.get_loc(c) + 1))

Upvotes: 2

piRSquared
piRSquared

Reputation: 294218

I'd use a combination of np.where and zip

i, j = np.where(dfp.isnull().values)

# Coordinates in the space of the actual index and column names
list(zip(dfp.index[i], dfp.columns[j]))

[(4, 'A'), (6, 'B'), (9, 'D')]

Otherwise, stick to ordinal positions with

list(zip(i, j))

[(4, 0), (6, 1), (9, 3)]

Or

np.column_stack([i, j])

array([[4, 0],
       [6, 1],
       [9, 3]])

Upvotes: 2

Related Questions