Reputation: 407
I will like to find the most optimized way of doing search on Pandas DataFrame
For example
I will like to search for value
aaa = 9 in the dataframe df
index Column value
1 aaa 1
2 aaa 3
3 aaa 5
4 aaa -3
5 aaa 3
6 aaa 0
It should result in the following
Answer: Combinations of Index locations (1,2,3), (1,2,3,4,5),(1,2,3,6),(1,2,3,4,5,6), (1,3,5), (1,3,5,6)
as they all sum upto 9
I am avoiding to run permutation loop for n^n times for all combinations. Any short cuts will be highly appreciated
Upvotes: 1
Views: 1485
Reputation: 49
Adapted jpp's answer:
import pandas as pd
import itertools
df = pd.DataFrame([['aaa', 1],['aaa',3], ['aaa',5], ['aaa',-3], ['aaa', 3], ['aaa',0]], columns=['A', 'B'],index=[1,2,3,4,5,6])
res = [i for j in range(1,df.index.size+1) for i in itertools.combinations(df.index,j) if df.loc[i, 'B'].sum() == 9]
print res
[(1, 2, 3), (1, 3, 5), (1, 2, 3, 6), (1, 3, 5, 6), (1, 2, 3, 4, 5), (1, 2, 3, 4, 5, 6)]
Upvotes: 1
Reputation: 164773
Here's the brute-force method using itertools.combinations
. You can optimize by using a generator and stop summing when values exceed a total of 9.
from itertools import combinations
d = df.set_index('index')['value'].to_dict()
n = len(d)
res = [i for j in range(n) for i in combinations(d, j) if sum(map(d.get, i)) == 9]
print(res)
[(1, 2, 3), (1, 3, 5), (1, 2, 3, 6), (1, 3, 5, 6), (1, 2, 3, 4, 5)]
Upvotes: 2