userD1989
userD1989

Reputation: 47

How to check if the both the columns in the dataframe has values using pandas

I have a df:
col1   col2   col3   col4    col5
bat    cell   val            val
cat    ribo   val    val
rat    dna    val            val
dog    rna    val    val     val

if i am comparing col4 and col5 i want to get the output as:

col1   col2   col3   col4    col5
dog    rna    val    val     val

bec col4 has value and col5 has value.

if i compare the col3 and col5 i should get the output as:

col1   col2   col3   col4    col5
bat    cell   val            val
rat    dna    val            val
dog    rna    val    val     val

but when i am using the following code:

dfn = df[df['col4'] != df['col5']]

not getting the correct df values.

and i want to add the output to the dataframe as:

col1   col2   col3   col5
dog    rna    val    val

Upvotes: 0

Views: 57

Answers (2)

Erfan
Erfan

Reputation: 42886

We can write a simple function for this to compare columns and rows which are empty:

Method 1: using Boolean indexing with notnull

df.replace('', np.NaN, inplace=True)

def compare_cols(dataframe, column1, column2):
    return df[df[column1].notnull() & df[column2].notnull()]

print(compare_cols(df, 'col4', 'col5'))
print('\n')
print(compare_cols(df, 'col3', 'col5'))

  col1 col2 col3 col4 col5
3  dog  rna  val  val  val


  col1  col2 col3 col4 col5
0  bat  cell  val  NaN  val
2  rat   dna  val  NaN  val
3  dog   rna  val  val  val

Edit after Jezraels comment. We can use dropna with subset which gives the same output:

Method 2: using dropna

def compare_cols2(dataframe, column1, column2):
    return df.dropna(subset=[column1, column2]) 

print(compare_cols2(df, 'col4', 'col5'))
print('\n')
print(compare_cols2(df, 'col3', 'col5'))

  col1 col2 col3 col4 col5
3  dog  rna  val  val  val


  col1  col2 col3 col4 col5
0  bat  cell  val  NaN  val
2  rat   dna  val  NaN  val
3  dog   rna  val  val  val

Note I replaced the whitespaces ('') with NaN so we can use notnull() method.

Upvotes: 2

vrana95
vrana95

Reputation: 521

 #can you try below
    df1=df.loc[(df['col4'].notnull() & df['col5'].notnull()),:]]
    print(df1)

Upvotes: 0

Related Questions