twegner
twegner

Reputation: 443

pandas remove rows with multiple criteria

Consider the following pandas Data Frame:

df = pd.DataFrame({
    'case_id': [1050, 1050, 1050, 1050, 1051, 1051, 1051, 1051],
    'elm_id': [101, 102, 101, 102, 101, 102, 101, 102],
    'cid': [1, 1, 2, 2, 1, 1, 2, 2],
    'fx': [736.1, 16.5, 98.8, 158.5, 272.5, 750.0, 333.4, 104.2],
    'fy': [992.0, 261.3, 798.3, 452.0, 535.9, 838.8, 526.7, 119.4],
    'fz': [428.4, 611.0, 948.3, 523.9, 880.9, 340.3, 890.7, 422.1]})

When printed looks like this:

   case_id  cid  elm_id     fx     fy     fz
0     1050    1     101  736.1  992.0  428.4
1     1050    1     102   16.5  261.3  611.0
2     1050    2     101   98.8  798.3  948.3
3     1050    2     102  158.5  452.0  523.9
4     1051    1     101  272.5  535.9  880.9
5     1051    1     102  750.0  838.8  340.3
6     1051    2     101  333.4  526.7  890.7
7     1051    2     102  104.2  119.4  422.1

I need to remove rows where 'case_id' = values in a List and 'cid' = values in a List. For simplicity lets just use Lists with a single value: cases = [1051] and ids = [1] respectively. In this scenario I want the NEW Data Frame to have (6) rows of data. It should look like this because there were two rows matching my criteria which should be removed:

   case_id  cid  elm_id     fx     fy     fz
0     1050    1     101  736.1  992.0  428.4
1     1050    1     102   16.5  261.3  611.0
2     1050    2     101   98.8  798.3  948.3
3     1050    2     102  158.5  452.0  523.9
4     1051    2     101  333.4  526.7  890.7
5     1051    2     102  104.2  119.4  422.1

I've tried a few different things like:

df2 = df[(df.case_id != subcase) & (df.cid != commit_id)]

But this returns the inverse of what I was expecting:

2     1050    2     101   98.8  798.3  948.3
3     1050    2     102  158.5  452.0  523.9

I've also tried using .query(): df.query('(case_id != 1051) & (cid != 1)') but got the same (2) rows of results.

Any help and/or explanations would be greatly appreciated.

Upvotes: 0

Views: 85

Answers (1)

harpan
harpan

Reputation: 8631

Your code looks for the rows that meets the criteria, not drop it. You can drop thee rows using .drop() Use the following:

df.drop(df.loc[(df['case_id'].isin(cases)) & (df['cid'].isin(ids))].index)

Output:

     case_id    cid elm_id  fx  fy  fz
0   1050    1   101 736.1   992.0   428.4
1   1050    1   102 16.5    261.3   611.0
2   1050    2   101 98.8    798.3   948.3
3   1050    2   102 158.5   452.0   523.9
6   1051    2   101 333.4   526.7   890.7
7   1051    2   102 104.2   119.4   422.1

Upvotes: 3

Related Questions