Reputation: 152
I currently have a pandas dataframe that has a column of values that are numpy arrays. I am trying to get the rows of the dataframe where the value of the column is an empty numpy array but I can't index using the pandas method. Here is an example dataframe.
data = {'Name': ['A', 'B', 'C', 'D'], 'stats': [np.array([1,1,1]), np.array([]), np.array([2,2,2]), np.array([])]}
df = pd.DataFrame(data)
I am trying to just get the rows where 'stats' is None, but when I try df[df['stats'] is None]
I just get a KeyError: False.
How can I filter by rows that contain an empty list?
Additionally, how can I filter by row where the numpy array is something specific? i.e. get all rows of df where df['stats'] == np.array([1, 1, 1])
Thanks
Upvotes: 0
Views: 238
Reputation: 1658
for this question: "Additionally, how can I filter by row where the numpy array is something specific? i.e. get all rows of df where df['stats'] == np.array([1, 1, 1])"
data = {'Name': ['A', 'B', 'C', 'D'], 'stats': [np.array([1,1,1]), np.array([]), np.array([2,2,2]), np.array([])]}
df = pd.DataFrame(data)
df = df[df['stats'].apply(lambda x: np.array_equal(x, np.array([1,1,1])))]
Upvotes: 1
Reputation: 863166
You can check length by Series.str.len
, because it working with all Iterables:
print (df['stats'].str.len())
0 3
1 0
2 3
3 0
Name: stats, dtype: int64
And then filter, e.g. rows with len=0
:
df = df[df['stats'].str.len().eq(0)]
#alternative
#df = df[df['stats'].apply(len).eq(0)]
print (df)
Name stats
1 B []
3 D []
If need test specific array is possible use tuple
s:
df =df[ df['stats'].apply(tuple) == tuple(np.array([1, 1, 1]))]
print (df)
Name stats
0 A [1, 1, 1]
Upvotes: 1