Reputation: 2216
I have a list of conditions to be run on the dataset to sort huge data.
df = A Huge_dataframe.
eg.
Index D1 D2 D3 D5 D6
0 8 5 0 False True
1 45 35 0 True False
2 35 10 1 False True
3 40 5 2 True False
4 12 10 5 False False
5 18 15 13 False True
6 25 15 5 True False
7 35 10 11 False True
8 95 50 0 False False
I have to sort above df based on given orders:
orders = [[A, B],[D, ~E, B], [~C, ~A], [~C, A]...]
#(where A, B, C , D, E are the conditions)
eg.
A = df['D1'].le(50)
B = df['D2'].ge(5)
C = df['D3'].ne(0)
D = df['D1'].ne(False)
E = df['D1'].ne(True)
# In the real scenario, I have 64 such conditions to be run on 5 million records.
eg. I have to run all these conditions to get the resultant output.
What is the easiest way to achieve the following task, to order them using for loop
or map
or .apply
?
df = df.loc[A & B]
df = df.loc[D & ~E & B]
df = df.loc[~C & ~A]
df = df.loc[~C & A]
Resultant df would be my expected output.
Here I am more interested in knowing, how would you use loop or map or .apply, If I want to run multiple conditions
which are stored in a list. Not the resultant output.
such as:
for i in orders:
df = df[all(i)] # I am not able to implement this logic for each order
Upvotes: 0
Views: 173
Reputation: 150755
You are looking for bitwise and
all the elements inside orders
. In which case:
df = df[np.concatenate(orders).all(0)]
Upvotes: 1