Jean-baptiste Briois
Jean-baptiste Briois

Reputation: 47

Python Pandas drop

I build a script with Python and i use Pandas. I'm trying to delete line from a dataframe. I want to delete lines that contains empty values into two specific columns. If one of those two column is regularly completed but not the other one, the line is preserved. So i have build this code that works. But i'm beginner and i am sure that i can simplify my work. I'm sure i don't need loop "for" in my function. I think there is a way with a good method. I read the doc on internet but i found nothing. I try my best but i need help. Also for some reasons i don't want to use numpy.

So here my code :

import pandas as pnd


def drop_empty_line(df):
    a = df[(df["B"].isna()) & (df["C"].isna())].index
    for i in a:
        df = df.drop([i])
    return df
    
    
def main():
    df = pnd.DataFrame({
            "A": [5, 0, 4, 6, 5], 
            "B": [pnd.NA, 4, pnd.NA, pnd.NA, 5], 
            "C": [pnd.NA, pnd.NA, 9, pnd.NA, 8], 
            "D": [5, 3, 8, 5, 2], 
            "E": [pnd.NA, 4, 2, 0, 3]
            })
    
    print(drop_empty_line(df))
    
    
if __name__ == '__main__':
    main()

Upvotes: 1

Views: 218

Answers (1)

mozway
mozway

Reputation: 260290

You indeed don't need a loop. You don't even need a custom function, there is already dropna:

df = df.dropna(subset=['B', 'C'], how='all')
# or in place:
# df.dropna(subset=['B', 'C'], how='all', inplace=True)

output:

   A     B     C  D  E
1  0     4  <NA>  3  4
2  4  <NA>     9  8  2
4  5     5     8  2  3

Upvotes: 2

Related Questions