pwwolff
pwwolff

Reputation: 626

Update Where Equivalent in Pandas

I have a list of columns in a dataframe that shouldn't be empty.

I want to remove any rows that are empty in any of these columns. My solution would be to iterate through the required columns and set the column 'excluded' to the error message that the user will be shown before excluding them (I will present these to the user in the form of a report at the end of the process)

I'm currently trying something like this:

for col in requiredColumns:
    df[pd.isnull(df[col])]['excluded'] = df[pd.isnull(df[col])]['excluded'].apply(lambda x: str(x) + col + ' empty, excluded')

but no luck - the columns aren't updated. The filter by itself (to get only the empty rows) works, the update part doesn't seem to be working.

I'm used to SQL:

UPDATE df SET e = e & "empty, excluded" WHERE NZ(col, '') = ''

Upvotes: 5

Views: 2055

Answers (2)

GavinBelson
GavinBelson

Reputation: 2784

If you need to update a panda based on multiple conditions:

You can simply use .loc

>>> df
      A   B    C
0     2  40  800
1     1  90  600
2     6  80  700
3  1998  70   55
4     1  90  300
5     7  80  700
6     4  20  300
7  1998  20    2
8     7  10  100
9  1998  60    2

>>> df.loc[(df['A'] > 7) & (df['B'] > 69) , 'C'] = 75

This will set 'C' = 75 where 'A' > 7 and 'B' > 69

Upvotes: 4

jpp
jpp

Reputation: 164623

One way is to use numpy functions to create a column with the desired marker.

Setup

import pandas as pd, numpy as np

df = pd.DataFrame({'A': [1, np.nan, 2, 3, 4, 5],
                   'B': [2, 3, np.nan, 5, 1, 9],
                   'C': [5, 8, 1, 9, np.nan, 7]})

     A    B    C
0  1.0  2.0  5.0
1  NaN  3.0  8.0
2  2.0  NaN  1.0
3  3.0  5.0  9.0
4  4.0  1.0  NaN
5  5.0  9.0  7.0

Solution

df['test'] = np.any(np.isnan(df.values), axis=1)

     A    B    C   test
0  1.0  2.0  5.0  False
1  NaN  3.0  8.0   True
2  2.0  NaN  1.0   True
3  3.0  5.0  9.0  False
4  4.0  1.0  NaN   True
5  5.0  9.0  7.0  False

Explanation

  • np.isnan returns a Boolean array corresponding to whether the elements of a numpy array are null.
  • Use np.any or np.all, as required, to determine which rows are in scope.
  • Use df.values to extract underlying numpy array from dataframe. For selected columns, you can use df[['A', 'B']].values.

Upvotes: 3

Related Questions