Reputation: 5126
I would like to get the row and column number based on a given condition. The "coordinates" if you will.
import re
import pandas as pd
import numpy as np
dfp = pd.DataFrame({'A' : [1,21,8,44,np.NaN,6,75,8,44,999],
'B' : [1,1,3,5,0,0,np.NaN,9,0,0],
'C' : ['AA1233445','AA1233445', 'rmacy','Idaho Rx','Ab123455','TV192837','RX','Ohio Drugs','RX12345','USA Pharma'],
'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
'E' : ['Assign','Assign','Hello','Ugly','Appreciate','Undo','Testing','Unicycle','Pharma','Unicorn',]})
print(dfp)
A B C D E
0 1.0 1.0 AA1233445 123456.0 Assign
1 21.0 1.0 AA1233445 123456.0 Assign
2 8.0 3.0 rmacy 1234567.0 Hello
3 44.0 5.0 Idaho Rx 12345678.0 Ugly
4 NaN 0.0 Ab123455 12345.0 Appreciate
5 6.0 0.0 TV192837 12345.0 Undo
6 75.0 NaN RX 12345678.0 Testing
7 8.0 9.0 Ohio Drugs 123456789.0 Unicycle
8 44.0 0.0 RX12345 1234567.0 Pharma
9 999.0 0.0 USA Pharma NaN Unicorn
I can get my output by doing:
print(dfp.loc[dfp['B'].isnull()].index.values[0] + 1 ,
',', + int([i for i,x in enumerate(dfp.columns.tolist()) if x == 'B'][0] + 1))
but the issue is if B
had multiple nulls. I would like the coordinates for all nulls.
Is there a way to do this using dataframe.loc
or something similar? Adding a 1 to the values is not a big deal, I can do that easily later on.
Upvotes: 1
Views: 1678
Reputation: 29307
You could use
dfp[pd.isnull(dfp['B'])].index.tolist()
For adding 1
concisely you could use:
np.asarray(dfp[pd.isnull(dfp['B'])].index) + 1
print(np.asarray(dfp[pd.isnull(dfp['B'])].index) + 1)
To include the index of column B (dfp.columns.get_loc("B") + 1
):
for x in np.asarray(dfp[pd.isnull(dfp['B'])].index) + 1:
print(str(x)+','+str(dfp.columns.get_loc("B") + 1))
To find "NaN"s in a given list of columns:
def find_NaN(list_col):
for c in list_col:
if c in dfp.columns:
for x in np.asarray(dfp[pd.isnull(dfp[c])].index) + 1:
print(str(x)+','+str(dfp.columns.get_loc(c) + 1))
find_NaN(["A","B"])
5,1
7,2
Some explanations
dfp[pd.isnull(dfp['B'])]
selects data from the dataframe using an array of boolean values.
dfp.columns.get_loc(c)
gives the index of column c
def find_NaN(list_col):
for c in list_col:
# if column is one of the dataframe's columns
if c in dfp.columns:
# for each index x where column c of the dataframe is null
for x in np.asarray(dfp[pd.isnull(dfp[c])].index) + 1:
print(str(x)+','+str(dfp.columns.get_loc(c) + 1))
Upvotes: 2
Reputation: 294218
I'd use a combination of np.where
and zip
i, j = np.where(dfp.isnull().values)
# Coordinates in the space of the actual index and column names
list(zip(dfp.index[i], dfp.columns[j]))
[(4, 'A'), (6, 'B'), (9, 'D')]
Otherwise, stick to ordinal positions with
list(zip(i, j))
[(4, 0), (6, 1), (9, 3)]
Or
np.column_stack([i, j])
array([[4, 0],
[6, 1],
[9, 3]])
Upvotes: 2