OptimusPrime
OptimusPrime

Reputation: 619

How can I filter a pandas dataset based on multiple columns?

I am trying some ifthen logic but I am working in a dataframe and couldn't find any examples.

What I am trying to do is filter this dataset to only include values where col1=col3 and col2=col4

col1       col2     col3       col4
Wagner     John     Wagner     John
Jane       Mary     Klein      Peter 
Schneider  Megan    Wicker     Sam
Schneider  Megan    Schneider  Megan

result

col1       col2     col3        col4
Wagner     John     Wagner      John
Schneider  Megan    Schneider   Megan

My code here doesn't work

 df1.apply(lambda x : x['col1'] if x['col1'] == x['col1'] and x['col2'] == x['col2'] else "", axis=1

Upvotes: 2

Views: 2723

Answers (2)

piRSquared
piRSquared

Reputation: 294508

messing around with numpy and assuming the columns are the specific order they are

df[np.equal(*df.values.T.reshape(2, 2, -1)).all(0)]

        col1   col2       col3   col4
0     Wagner   John     Wagner   John
3  Schneider  Megan  Schneider  Megan

If columns were in another order

cols = ['col1', 'col2', 'col3', 'col4']
v = np.column_stack([df[c].values for c in cols])
df[np.equal(*v.T.reshape(2, 2, -1)).all(0)]

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210942

I'd use DataFrame.query() method:

In [205]: df.query("col1==col3 and col2==col4")
Out[205]:
        col1   col2       col3   col4
0     Wagner   John     Wagner   John
3  Schneider  Megan  Schneider  Megan

or a "classical" approach:

In [206]: df.loc[(df.col1==df.col3) & (df.col2==df.col4)]
Out[206]:
        col1   col2       col3   col4
0     Wagner   John     Wagner   John
3  Schneider  Megan  Schneider  Megan

Upvotes: 4

Related Questions