LivingstoneM
LivingstoneM

Reputation: 1088

Delete rows with missing values from certain location in data frame python

Good people, still learning python.Now I have a very huge dataframe of around 1 million rows however I do what to delete rows which have missing values from certain column location. I mean after locating rows with missing values from certain column, I should be able to delete the entire row.Look at the sample dataframe:

import pandas as pd 

data = {'A':['NW', 'NB', 'UK', 'CAN'],'B':['Tom', 'nick', 'krish', 'jack'], 'C':[20, '', 19, ''],'D':[20, '', 19, ''],'E':[20, '', 19, ''],'F':[20, '', 19, '']} 
df = pd.DataFrame(data) 
print(df)

in this dataframe, I want to identify rows with missing values from column C to column F then delete entire row, which means I have to delete row 1 and row 3. I have tried this but it aint working :

df.dropna(subset=['C','D','E','F'], how='all', inplace = True)

EXPECTED OUTPUT should be a table like this

import pandas as pd 

data = {'A':['NW', 'UK'],'B':['Tom', 'krish'], 'C':[20, 19],'D':[20, 19 ],'E':[20,  19 ],'F':[20, 19]} 
df = pd.DataFrame(data) 
print(df)

Upvotes: 2

Views: 1939

Answers (3)

piRSquared
piRSquared

Reputation: 294576

pandas.DataFrame.mask

  • Use mask to make the dataframe null anywhere a condition is True
  • Then use dropna with the subset argument

This is instead of using replace as mentioned by zipa and jezrael. There isn't any advantage to using this over the other methods. I just like using mask. It also serves the community to show another method for accomplishing the task.


df.mask(df.eq('')).dropna(subset=['C', 'D', 'E', 'F'])

    A      B   C   D   E   F
0  NW    Tom  20  20  20  20
2  UK  krish  19  19  19  19

Upvotes: 4

zipa
zipa

Reputation: 27899

This should do it:

df.replace(r'^\s*$', np.nan, regex=True).dropna()

Upvotes: 3

jezrael
jezrael

Reputation: 863791

One solution is compare empty strings in list of columns and get at least one one '' per rows:

df = df[df[['C','D','E','F']].ne('').any(1)]

Details:

print(df[['C','D','E','F']].ne(''))
       C      D      E      F
0   True   True   True   True
1  False  False  False  False
2   True   True   True   True
3  False  False  False  False

print (df[['C','D','E','F']].ne('').any(1))
0     True
1    False
2     True
3    False
dtype: bool

Alternative with inverted condition by ~:

df = df[~df[['C','D','E','F']].eq('').all(1)]

Details:

print(df[['C','D','E','F']].eq(''))
       C      D      E      F
0  False  False  False  False
1   True   True   True   True
2  False  False  False  False
3   True   True   True   True

print (df[['C','D','E','F']].eq('').all(1))
0    False
1     True
2    False
3     True
dtype: bool

print (~df[['C','D','E','F']].eq('').all(1))
0     True
1    False
2     True
3    False
dtype: bool

print(df)
    A      B   C   D   E   F
0  NW    Tom  20  20  20  20
2  UK  krish  19  19  19  19

For working with your solution is necessary replace empty strings to missing values first:

df = df.replace('',np.nan)
df.dropna(subset=['C','D','E','F'], how='all', inplace = True)
print(df)
    A      B     C     D     E     F
0  NW    Tom  20.0  20.0  20.0  20.0
2  UK  krish  19.0  19.0  19.0  19.0

Upvotes: 2

Related Questions