Reputation:
I have a very large DataFrame with many columns (almost 300). I would like to remove all rows in which the values of all columns, except a column called 'Country' is NaN.
dropna can remove rows in which all or some values are NaN. But what Is an efficient way to do it if there's a column you want to exclude from the process?
Upvotes: 2
Views: 2838
Reputation: 162
This should work:
def remove_if_eq(v,dataframe,keys):
for i in df.index:
delete = True
for j in df.columns:
if df[j][i] != v:
delete = False
break
if delete:dataframe = dataframe.drop(i)
return(dataframe)
v is the value dataframe is the dataframe keys are the names of the colums you want searched
Upvotes: 0
Reputation: 22493
You can use filter
to exclude the column Country
:
df = pd.DataFrame({"A":[nan, 2, 3,], "Countryside":[nan, 4, 6], "Country":list("ABC")})
A Countryside Country
0 NaN NaN A
1 2.0 4.0 B
2 3.0 6.0 C
print (df[~df.filter(regex=r"^(?!Country\b)").isnull().all(1)])
A Countryside Country
1 2.0 4.0 B
2 3.0 6.0 C
Upvotes: 0
Reputation: 75080
Try:
mask = df.drop("Country",axis=1).isna().all(1) & df['Country'].notna()
out = df[~mask]
Upvotes: 1