mae
mae

Reputation: 57

How to filter record with condition blank field in Pandas

I newbie with phython, I have I have the following table TABLE.INFO:

ID_STUDENT|CURRENCY|ID_COURSE|STATUS
A12345|IDR|MATH|ACTIVE
A12346|||INACTIVE
A12347|SGD|MATH|ACTIVE
A12348||ENGLISH|ACTIVE
A12349|IDR||INACTIVE
B12345|IDR|ENGLISH|
B12346|IDR|PHYSICS|ACTIVE
B12347||PHYSICS|
B12348|THB||ACTIVE
B12349|||

I want to achieve the result which selects all field on TABLE.INFO with condition only having any field blank:

ID_STUDENT|CURRENCY|ID_COURSE|STATUS
A12346|||INACTIVE
A12346|||INACTIVE
A12348||ENGLISH|ACTIVE
A12349|IDR||INACTIVE
B12345|IDR|ENGLISH|
B12347||PHYSICS|
B12348|THB||ACTIVE
B12349|||

How can I achieve this using pandas phyton? Can someone help with the script? Thank you before.

Upvotes: 0

Views: 62

Answers (2)

chmielcode
chmielcode

Reputation: 111

import io

import pandas as pd

data = R"""ID_STUDENT|CURRENCY|ID_COURSE|STATUS
A12345|IDR|MATH|ACTIVE
A12346|||INACTIVE
A12347|SGD|MATH|ACTIVE
A12348||ENGLISH|ACTIVE
A12349|IDR||INACTIVE
B12345|IDR|ENGLISH|
B12346|IDR|PHYSICS|ACTIVE
B12347||PHYSICS|
B12348|THB||ACTIVE
B12349|||"""
df = pd.read_csv(io.StringIO(data), sep='|')
# if you care about the number on NaNs (> 0)
df.loc[df.isna().sum(axis=1) > 0]
# if not
df.loc[df.isna().any(axis=1)]

Upvotes: 0

oppressionslayer
oppressionslayer

Reputation: 7204

Try this, it should work:

df
#  ID_STUDENT CURRENCY ID_COURSE    STATUS
#0     A12345      IDR      MATH    ACTIVE
#1     A12346      NaN       NaN  INACTIVE
#2     A12347      SGD      MATH    ACTIVE
#3     A12348      NaN   ENGLISH    ACTIVE
#4     A12349      IDR       NaN  INACTIVE
#5     B12345      IDR   ENGLISH       NaN
#6     B12346      IDR   PHYSICS    ACTIVE
#7     B12347      NaN   PHYSICS       NaN
#8     B12348      THB       NaN    ACTIVE
#9     B12349      NaN       NaN       NaN

df[df.isna().any(axis=1)]

#  ID_STUDENT CURRENCY ID_COURSE    STATUS
#1     A12346      NaN       NaN  INACTIVE
#3     A12348      NaN   ENGLISH    ACTIVE
#4     A12349      IDR       NaN  INACTIVE
#5     B12345      IDR   ENGLISH       NaN
#7     B12347      NaN   PHYSICS       NaN
#8     B12348      THB       NaN    ACTIVE
#9     B12349      NaN       NaN       NaN

For the opposite, you can do:

df[~df.isna().any(axis=1)]                                                                                                                                                                       
#  ID_STUDENT CURRENCY ID_COURSE  STATUS
#0     A12345      IDR      MATH  ACTIVE
#2     A12347      SGD      MATH  ACTIVE
#6     B12346      IDR   PHYSICS  ACTIVE

Upvotes: 1

Related Questions