Reputation: 832
I have a pd.DataFrame
containing a mask and np.array
. I want to apply the mask on the array (like I would do with np.where
)
Does anyone have an idea how to succeed ?
df = pd.DataFrame({'Mask' : [[True, False, True], [False, False], [True, True]],
'Array' : [[2, 5,4] , [1, 0] , [4, 5],],
'Result' : [[2, 4] , [] , [4,5]]})
def ffilter(entry):
return entry['Array']['Mask']
df.apply(ffilter) #--> Nope too easy :-(
Upvotes: 0
Views: 632
Reputation: 25367
You could just create a mask by using df.Mask
, pass it to the mask()
function of the data frame and aggregate.
This would be the "one-liner":
pd.DataFrame(df.Array.tolist())\
.mask(np.asarray(df.Mask.tolist()))\
.agg(['mean', 'std', 'min', 'max'])
which gives you:
0 1
mean 1.0 2.500000
std NaN 3.535534
min 1.0 0.000000
max 1.0 5.000000
Or as a whole:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Mask' : [[True, False], [False, False], [True, True]],
'Array' : [[2, 5] , [1, 0] , [4, 5],],
'Result' : [[2] , [] , [4, 5]]})
df_Array = pd.DataFrame(df.Array.tolist())
mask = np.asarray(df.Mask.tolist())
df_Array.mask(mask).agg(['mean', 'std', 'min', 'max'])
From the comments, it is still not clear what your desired output is. I'll just assume you want to calculate statistics like min, max, std etc for each of these array in your data frame - and further - have a data frame where each row represents one of those arrays:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Mask' : [[True, False, True], [False, False], [True, True]],
'Array' : [[2, 5,4] , [1, 0] , [4, 5],],
'Result' : [[2, 4] , [] , [4,5]]})
df_stats = df.apply(lambda x: pd.Series(x.Array)[x.Mask]
.agg(['min', 'max', 'std', 'mean']), 1)
print(df_stats)
which produces:
min max std mean
0 2.0 4.0 1.414214 3.0
1 NaN NaN NaN NaN
2 4.0 5.0 0.707107 4.5
Upvotes: 2
Reputation: 832
That does the trick even if it's not really pythonic.
arr = df.Array.tolist()
mask = df.Mask.tolist()
result = [[np.asarray(a)[m]] for a, m in zip(arr, (mask))]
result
>>>[[array([2, 4])], [array([], dtype=int64)], [array([4, 5])]]
Upvotes: 0