Reputation: 886
I am trying to remove rows from a DataFrame that contain null values within numpy array
DataFrame:
name array
A [nan, nan, nan]
B [111.425818592, -743.060293425, -180.420675659]
Expected output
name array
B [111.425818592, -743.060293425, -180.420675659]
My attempt:
df = df[df['array'].apply(lambda x: np.where(~np.isnan(x)))]
Error i am getting is:
TypeError: unhashable type: 'numpy.ndarray'
Upvotes: 2
Views: 1202
Reputation: 9274
You really should consider dropping the use of numpy arrays within dataframe columns, every operation you do on the series is going to be a heartache. Instead just convert into a dataframe and then use pandas functionaities
dfnew = pd.DataFrame(np.concatenate([df.name.values.reshape(-1,1),
np.array(df.array.tolist())],axis=1),
columns['name','array1','array2','array3'])
name array1 array2 array3
0 A NaN NaN NaN
1 B 111.426 -743.06 -180.421
Now you can use dropna()
dfnew.dropna(axis=0)
name array1 array2 array3
1 B 111.426 -743.06 -180.421
You can than always extract a single array if need be by
dfnew.iloc[1,1:].values
array([111.425818592, -743.060293425, -180.420675659], dtype=object)
Upvotes: 0
Reputation: 164783
Here is one way:
import pandas as pd, numpy as np
df = pd.DataFrame([['A', np.array([np.nan, np.nan, np.nan])],
['B', np.array([111.425818592, -743.060293425, -180.420675659])]],
columns=['name', 'array'])
df = df[~np.all(list(map(np.isnan, df['array'])), axis=1)]
# name array
# 1 B [111.425818592, -743.060293425, -180.420675659]
Or, if you want to remove rows where any values of the array are NaN
:
df = df[~np.any(list(map(np.isnan, df['array'])), axis=1)]
Upvotes: 0
Reputation: 323356
Data from jpp
df[~pd.DataFrame(df.array.tolist()).isnull().all(1)]
Out[391]:
name array
1 B [111.425818592, -743.060293425, -180.420675659]
Upvotes: 2