Reputation: 33
I have a pandas dataframe that contains arrays within some of its columns. I'd like to filter the dataframe to only contain rows that have a certain value found in the nested array for that column.
For example, I have a dataframe something like this:
label MODEL_INDEX ARRAY_VAL
ID
0 4 (11.0, 0.0)
1 65 (11.0, 10.0)
2 73 (11.0, 10.0)
3 74 (10.0, 0.0)
4 79 (11.0, 0.0)
5 80 (10.0, 0.0)
6 88 (11.0, 0.0)
And I'd like to filter the dataframe to only include those satisfying some variable condition, say containing 10.0, in the array under ARRAY_VAL to get this:
label MODEL_INDEX ARRAY_VAL
ID
1 65 (11.0, 10.0)
2 73 (11.0, 10.0)
3 74 (10.0, 0.0)
5 80 (10.0, 0.0)
Essentially, looking for something like:
df[df['ARRAY_VAL'] where 10.0 in df['ARRAY_VAL]]
Upvotes: 3
Views: 9819
Reputation: 2927
First build up an index
index = []
for i, row in enumerate(df.ARRAY_VAL):
if 10.0 in row:
index.append(i)
then index the data where we found 10.0
in df['ARRAY_VAL']
>>> df.loc[index]
MODEL_INDEX ARRAY_VAL
1 65 (11, 10)
2 73 (11, 10)
3 74 (10, 0)
5 80 (10, 0)
Upvotes: 0
Reputation: 1215
I think apply
is needed since you want to test 10.0 in x
for every tuple value x
.
df[df['ARRAY_VAL'].apply(lambda x: 10.0 in x)]
Upvotes: 4
Reputation: 324
You can use .apply
to search the list in each row of the data frame:
# creating the dataframe
df = pd.DataFrame(columns = ['model_idx','array_val'])
df.model_idx = [4,65,73,74,79,80,88]
df.array_val = [[11,0],
[11,10],
[11,10],
[10,0],
[11,0],
[10,0],
[11,0]]
# results is a boolean indicating whether the value is found in the list
results = df.array_val.apply(lambda a: 10 in a)
# filter the dataframe based on the boolean indicator
df_final = df[results]
The filtered data frame is:
In [41]: df_final.head()
Out[41]:
model_idx array_val
1 65 [11, 10]
2 73 [11, 10]
3 74 [10, 0]
5 80 [10, 0]
Upvotes: 9