Reputation: 1088

Delete rows with missing values from certain location in data frame python

Good people, still learning python.Now I have a very huge dataframe of around 1 million rows however I do what to delete rows which have missing values from certain column location. I mean after locating rows with missing values from certain column, I should be able to delete the entire row.Look at the sample dataframe:

import pandas as pd 

data = {'A':['NW', 'NB', 'UK', 'CAN'],'B':['Tom', 'nick', 'krish', 'jack'], 'C':[20, '', 19, ''],'D':[20, '', 19, ''],'E':[20, '', 19, ''],'F':[20, '', 19, '']} 
df = pd.DataFrame(data) 
print(df)

in this dataframe, I want to identify rows with missing values from column C to column F then delete entire row, which means I have to delete row 1 and row 3. I have tried this but it aint working :

df.dropna(subset=['C','D','E','F'], how='all', inplace = True)

EXPECTED OUTPUT should be a table like this

import pandas as pd 

data = {'A':['NW', 'UK'],'B':['Tom', 'krish'], 'C':[20, 19],'D':[20, 19 ],'E':[20,  19 ],'F':[20, 19]} 
df = pd.DataFrame(data) 
print(df)

Upvotes: 2

Answers (3)

piRSquared

Reputation: 294576

`pandas.DataFrame.mask`

Use mask to make the dataframe null anywhere a condition is True
Then use dropna with the subset argument

This is instead of using replace as mentioned by zipa and jezrael. There isn't any advantage to using this over the other methods. I just like using mask. It also serves the community to show another method for accomplishing the task.

df.mask(df.eq('')).dropna(subset=['C', 'D', 'E', 'F'])

    A      B   C   D   E   F
0  NW    Tom  20  20  20  20
2  UK  krish  19  19  19  19

Upvotes: 4

zipa

Reputation: 27899

This should do it:

df.replace(r'^\s*$', np.nan, regex=True).dropna()

Upvotes: 3

jezrael

Reputation: 863791

One solution is compare empty strings in list of columns and get at least one one '' per rows:

df = df[df[['C','D','E','F']].ne('').any(1)]

Details:

print(df[['C','D','E','F']].ne(''))
       C      D      E      F
0   True   True   True   True
1  False  False  False  False
2   True   True   True   True
3  False  False  False  False

print (df[['C','D','E','F']].ne('').any(1))
0     True
1    False
2     True
3    False
dtype: bool

Alternative with inverted condition by ~:

df = df[~df[['C','D','E','F']].eq('').all(1)]

Details:

print(df[['C','D','E','F']].eq(''))
       C      D      E      F
0  False  False  False  False
1   True   True   True   True
2  False  False  False  False
3   True   True   True   True

print (df[['C','D','E','F']].eq('').all(1))
0    False
1     True
2    False
3     True
dtype: bool

print (~df[['C','D','E','F']].eq('').all(1))
0     True
1    False
2     True
3    False
dtype: bool

print(df)
    A      B   C   D   E   F
0  NW    Tom  20  20  20  20
2  UK  krish  19  19  19  19

For working with your solution is necessary replace empty strings to missing values first:

df = df.replace('',np.nan)
df.dropna(subset=['C','D','E','F'], how='all', inplace = True)
print(df)
    A      B     C     D     E     F
0  NW    Tom  20.0  20.0  20.0  20.0
2  UK  krish  19.0  19.0  19.0  19.0

Upvotes: 2

Delete rows with missing values from certain location in data frame python

Answers (3)

pandas.DataFrame.mask

Related Questions

`pandas.DataFrame.mask`