Reputation: 135
I am desperatly trying to figure out how to print out the row index and col name for specific values in my df.
I have the following df:
raw_data = {'first_name': [NaN, 'Molly', 'Tina', 'Jake', 'Amy'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
'age': [42, 52, NaN, 24, 73],
'preTestScore': [4, 24, 31, 33, 3],
'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age',
'preTestScore','postTestScore'])
I now want to print out the index and column name for the NaN:
There is a missing value in row 0 for first_name.
There is a missing value in row 2 for age.
I have searched a lot and always found how to do something for one row. My idea is to first create a df with False and True
na = df.isnull()
Then I want to apply some function that prints the row number and col_name for every NaN value. I just cant figure out how to do this.
Thanks in advance for any help!
Upvotes: 7
Views: 251
Reputation: 4929
As simple as:
np.where(df.isnull())
It returns a tuple with the row indexes, and column indexes with NAs
, respectively.
Example:
na_idx = np.where(df.isnull())
for i,j in zip(*na_idx):
print(f'Row {i} and column {j} ({df.columns[j]}) is NA.')
Upvotes: 3
Reputation: 6179
I think melting is the way to go.
I'd start by creating a dataframe with columns: index, column_name, value.
Then filter column value
by not null.
And dump the result to dict.
df = pd.melt(df.reset_index(), id_vars=['index'], value_vars=df.columns)
selected = df[df['value'].isnull()].drop('value', axis=1).set_index('index')
resp = selected.T.to_dict(orient='records')[0]
s = "There is a missing value in row {idx} for {col_name}."
for record in resp.items():
idx, col_name = record
print(s.format(idx=idx, col_name=col_name))
Upvotes: 1
Reputation: 661
you can just create a variable NaN = "null" to indicate empty cell
import pandas as pd
NaN = "null"
raw_data = {'first_name': [NaN, 'Molly', 'Tina', 'Jake', 'Amy'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
'age': [42, 52, NaN, 24, 73],
'preTestScore': [4, 24, 31, 33, 3],
'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age',
'preTestScore','postTestScore'])
print(df)
output:
first_name last_name age preTestScore postTestScore
0 null Miller 42 4 25
1 Molly Jacobson 52 24 94
2 Tina Ali null 31 57
3 Jake Milner 24 33 62
4 Amy Cooze 73 3 70
Upvotes: 0
Reputation: 3285
You could do something like the below:
for i, row in df.iterrows():
nans = row[row.isna()].index
for n in nans:
print('row: %s, col: %s' % (i, n))
Upvotes: 2
Reputation: 8219
had to change the df a bit because of NaN
. Replaced with np.nan
import numpy as np
import pandas as pd
raw_data = {'first_name': [np.nan, 'Molly', 'Tina', 'Jake', 'Amy'],
'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],
'age': [42, 52, np.nan, 24, 73],
'preTestScore': [4, 24, 31, 33, 3],
'postTestScore': [25, 94, 57, 62, 70]}
you can do this
dfs = df.stack(dropna = False)
[f'There is a missing value in row {i[0]} for {i[1]}' for i in dfs[dfs.isna()].index]
prints a list
['There is a missing value in row 0 for first_name',
'There is a missing value in row 2 for age']
Upvotes: 4