Filter out according to specific multiple conditions in pandas

Question

I have the following dataframe that I read from csv file:

   gene name  annotation  ng DNA
 0  HRAS       G12S        3.00
 1  PIK3CA     R88L        3.00
 2  BRAF       E474A       3.00
 3  EGFR       E734Q       3.00
 4  EGFR       V769        3.00
 5  BRAF       LQ599PE     4.00
 6  BRAF       KT587NA     4.00
 7  HRAS       G12S        17.70

I'd like to filter according to multiple conditions in 2 columns: for example to filter according to 'BRAF' + 'E474A' and 'HRAS'+ 'G12S', so the following df will be created:

   gene name  annotation  ng DNA
 0  HRAS       G12S        3.00
 2  BRAF       E474A       3.00
 7  HRAS       G12S        17.70

Any ideas for an elegant solution?

jezrael · Accepted Answer

Use boolean indexing and for join all masks to one by np.logical_or.reduce:

m1 = (df['gene name'] == 'BRAF') & (df['annotation'] == 'E474A')
m2 = (df['gene name'] == 'HRAS') & (df['annotation'] == 'G12S')

df = df[np.logical_or.reduce([m1, m2])]
print (df)
  gene name annotation  ng DNA
0      HRAS       G12S     3.0
2      BRAF      E474A     3.0
7      HRAS       G12S    17.7

More dynamic solution with list of tuples for values for filter in list comprehension:

tup = [('BRAF','E474A'), ('HRAS', 'G12S')]
df = df[np.logical_or.reduce([(df['gene name']== a)&(df['annotation']== b) for a, b in tup])]
print (df)
  gene name annotation  ng DNA
0      HRAS       G12S     3.0
2      BRAF      E474A     3.0
7      HRAS       G12S    17.7

Filter out according to specific multiple conditions in pandas

Answers (1)

Related Questions