Reputation: 2500
I have a dataframe that looks something like :
LastName Date ObjectCol1 ObjectCol2 ObjectCol3 NumCol1 NumCol2 NumCol3 Intermediate1 Intermediate2
ABC March NA NA
ABC June NA NA
XYZ March NA Danger
XYZ June Danger Danger2
XYZ July Danger NA
AAA March NA NA
AAA June NA NA
DEF March NA NA
DEF June Danger Danger2
DEF July Danger NA
I want to split this dataframe into two dataframes such that only those LastNames which have "always" (meaning at all dates) have had "NA" (string object) (not null) in the Intermediate1 and 2 columns will be filtered into this 1 dataframe. And the second dataframe will have those records where any of the LastNames has had "Non NA" value (atleast once in the date history) in the Intermediate* columns
So for the sample df above, rows with ABC and AAA will go into one dataframe. and rows with XYZ and DEF should go into another dataframe.
I am looking at groupby and filtering, however the date column handling isnt happening well. Can you suggest anything ?
Upvotes: 1
Views: 57
Reputation: 863166
First get all LastName
values which not matched missing values by boolean indexing
with DataFrame.isna
and DataFrame.all
and then test original values of LastName
with isin
and inverted mask by ~
for df1
, non inverted for df2
:
vals = df.loc[~df[['Intermediate1','Intermediate2']].isna().all(axis=1),'LastName']
mask = df['LastName'].isin(vals)
Alternative solution with GroupBy.transform
and GroupBy.any
for test if at least one True
per group:
m = ~df[['Intermediate1','Intermediate2']].isna().all(axis=1)
mask = m.groupby(df['LastName']).transform('any')
df1 = df[~mask]
df2 = df[mask]
print (df1)
LastName Date ObjectCol1 ObjectCol2 ObjectCol3 NumCol1 NumCol2 \
0 ABC March NaN NaN NaN NaN NaN
1 ABC June NaN NaN NaN NaN NaN
5 AAA March NaN NaN NaN NaN NaN
6 AAA June NaN NaN NaN NaN NaN
NumCol3 Intermediate1 Intermediate2
0 NaN NaN NaN
1 NaN NaN NaN
5 NaN NaN NaN
6 NaN NaN NaN
print (df2)
LastName Date ObjectCol1 ObjectCol2 ObjectCol3 NumCol1 NumCol2 \
2 XYZ March NaN NaN NaN NaN NaN
3 XYZ June NaN NaN NaN NaN NaN
4 XYZ July NaN NaN NaN NaN NaN
7 DEF March NaN NaN NaN NaN NaN
8 DEF June NaN NaN NaN NaN NaN
9 DEF July NaN NaN NaN NaN NaN
NumCol3 Intermediate1 Intermediate2
2 NaN NaN Danger
3 NaN Danger Danger2
4 NaN Danger NaN
7 NaN NaN NaN
8 NaN Danger Danger2
9 NaN Danger NaN
Upvotes: 1