Reputation: 626
I have a list of columns in a dataframe that shouldn't be empty.
I want to remove any rows that are empty in any of these columns. My solution would be to iterate through the required columns and set the column 'excluded' to the error message that the user will be shown before excluding them (I will present these to the user in the form of a report at the end of the process)
I'm currently trying something like this:
for col in requiredColumns:
df[pd.isnull(df[col])]['excluded'] = df[pd.isnull(df[col])]['excluded'].apply(lambda x: str(x) + col + ' empty, excluded')
but no luck - the columns aren't updated. The filter by itself (to get only the empty rows) works, the update part doesn't seem to be working.
I'm used to SQL:
UPDATE df SET e = e & "empty, excluded" WHERE NZ(col, '') = ''
Upvotes: 5
Views: 2055
Reputation: 2784
If you need to update a panda based on multiple conditions:
You can simply use .loc
>>> df
A B C
0 2 40 800
1 1 90 600
2 6 80 700
3 1998 70 55
4 1 90 300
5 7 80 700
6 4 20 300
7 1998 20 2
8 7 10 100
9 1998 60 2
>>> df.loc[(df['A'] > 7) & (df['B'] > 69) , 'C'] = 75
This will set 'C' = 75 where 'A' > 7 and 'B' > 69
Upvotes: 4
Reputation: 164623
One way is to use numpy
functions to create a column with the desired marker.
Setup
import pandas as pd, numpy as np
df = pd.DataFrame({'A': [1, np.nan, 2, 3, 4, 5],
'B': [2, 3, np.nan, 5, 1, 9],
'C': [5, 8, 1, 9, np.nan, 7]})
A B C
0 1.0 2.0 5.0
1 NaN 3.0 8.0
2 2.0 NaN 1.0
3 3.0 5.0 9.0
4 4.0 1.0 NaN
5 5.0 9.0 7.0
Solution
df['test'] = np.any(np.isnan(df.values), axis=1)
A B C test
0 1.0 2.0 5.0 False
1 NaN 3.0 8.0 True
2 2.0 NaN 1.0 True
3 3.0 5.0 9.0 False
4 4.0 1.0 NaN True
5 5.0 9.0 7.0 False
Explanation
np.isnan
returns a Boolean array corresponding to whether the elements of a numpy
array are null.np.any
or np.all
, as required, to determine which rows are in scope.df.values
to extract underlying numpy
array from dataframe. For selected columns, you can use df[['A', 'B']].values
.Upvotes: 3