anakaine
anakaine

Reputation: 1248

Python Pandas - using .loc to select with AND and OR on multiple columns

I've a situation where I'm trying to select some scenarios from a dataframe in a single pass. The following code is what I'm using at the moment:

dfWater1 = left_merged.loc[left_merged.BVG_2M.isin(['34']) and left_merged.VHC_SC.isin(['6. Nil veg']) and left_merged.wetland.isin(['Estuarine wetlands (e.g. mangroves).', 'Lacustrine wetland (e.g. lake).']) | left_merged.RE.isin(['water', 'reef', 'ocean', 'estuary', 'canal'])].copy()

Alternatively, with some extra brackets to contain the AND and separate the OR:

dfWater1 = left_merged.loc[(left_merged.BVG_2M.isin(['34']) and left_merged.VHC_SC.isin(['6. Nil veg']) and left_merged.wetland.isin(['Estuarine wetlands (e.g. mangroves).', 'Lacustrine wetland (e.g. lake).'])) | (left_merged.RE.isin(['water', 'reef', 'ocean', 'estuary', 'canal']))].copy()

Basically, I'm asking for rows to be selected where:

   (
      Column BVG_2M = 34
         AND
      Column VHC_SC = '6. Nil veg'
         AND
      Column wetland is one of the following ['Estuarine wetlands (e.g. mangroves).', 'Lacustrine wetland (e.g. lake).']
   )
OR
   (
      Column RE is one of the following ['water', 'reef', 'ocean', 'estuary', 'canal']
   )

The dataset is pretty big, so I'd like to try and keep the select fast (hence using .loc and approaching this in a vectorised fashion), and try to avoid creating more data frames than necessary to preserve memory, if possible.

My real issue I think, is that I'm not sure how to structure the .loc statement, or even if I can do it like this.

Error Message

File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\pandas\core\generic.py", line 1479, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 2

Views: 643

Answers (1)

David Erickson
David Erickson

Reputation: 16683

You should use & instead of and as well as wrap parentheses around each condition. Formatting on new lines with everything lining up also helps prevent mistakes with parentheses:

dfWater1 = left_merged.loc[((left_merged.BVG_2M.isin(['34'])) &
                            (left_merged.VHC_SC.isin(['6. Nil veg'])) &
                            (left_merged.wetland.isin(['Estuarine wetlands (e.g. mangroves).', 'Lacustrine wetland (e.g. lake).']))) 
                          | (left_merged.RE.isin(['water', 'reef', 'ocean', 'estuary', 'canal']))].copy()

Upvotes: 2

Related Questions