EmJ
EmJ

Reputation: 4618

How to efficiently search a list in python

I have a dictionary with only 4 keys (mydictionary) and a list (mynodes) as follows.

    mydictionary = {0: {('B', 'E', 'G'), ('A', 'E', 'G'), ('A', 'E', 'F'), ('A', 'D', 'F'), ('C', 'D', 'F'), ('C', 'E', 'F'), ('A', 'D', 'G'), ('C', 'D', 'G'), ('C', 'E', 'G'), ('B', 'E', 'F')}, 
1: {('A', 'C', 'G'), ('E', 'F', 'G'), ('D', 'E', 'F'), ('A', 'F', 'G'), ('A', 'B', 'G'), ('B', 'D', 'F'), ('C', 'F', 'G'), ('A', 'C', 'E'), ('D', 'E', 'G'), ('B', 'F', 'G'), ('B', 'C', 'G'), ('A', 'C', 'D'), ('A', 'B', 'F'), ('B', 'D', 'G'), ('B', 'C', 'F'), ('A', 'D', 'E'), ('C', 'D', 'E'), ('A', 'C', 'F'), ('A', 'B', 'E'), ('B', 'C', 'E'), ('D', 'F', 'G')}, 
2: {('B', 'D', 'E'), ('A', 'B', 'D'), ('B', 'C', 'D')}, 
3: {('A', 'B', 'C')}}

mynodes = ['E', 'D', 'G', 'F', 'B', 'A', 'C']

I am checking how many times each node in mynodes list is in each key of mydictionary. For example, consider the above dictionary and list.

The output should be;

{'E': [(0, 6), (1, 8), (2, 1), (3, 0)], 
'D': [(0, 4), (1, 8), (2, 3), (3, 0)], 
'G': [(0, 5), (1, 10), (2, 0), (3, 0)], 
'F': [(0, 5), (1, 10), (2, 0), (3, 0)], 
'B': [(0, 2), (1, 9), (2, 3), (3, 1)], 
'A': [(0, 4), (1, 9), (2, 1), (3, 1)], 
'C': [(0, 4), (1, 9), (2, 1), (3, 1)]}

For example, consider E. It appears 6 times in 0 key, 8 times in 1 key, 2 times in 2 key and 0 times in 3 key.

My current code is as follows.

    triad_class_for_nodes = {}

    
    for node in mynodes:
        temp_list = []
                
        for key, value in mydictionary.items():                
            temp_counting = 0
            
            for triad in value:
                #print(triad[0])
                if node in triad:
                    temp_counting = temp_counting + 1
            temp_list.append(tuple((key, temp_counting)))
    
        triad_class_for_nodes.update({node: temp_list})
    print(triad_class_for_nodes)

This works fine with the small dictionary values.

However, in my real dataset, I have millions of tuples in the value list for each of my 4 keys in my dictionary. Hence, my existing code is really inefficient and takes days to run.

When I search on how to make this more efficient I came accross this question (Fastest way to search a list in python), which suggests to make the list of values to a set. I tried this as well. However, it also takes days to run.

I am just wondering if there is a more efficient way of doing this in python. I am happy to transform my existing data formats into different structures (such as pandas dataframe) to make things more efficient.

A small sample of mydictionary and mynodes is attached below for testing purposes. https://drive.google.com/drive/folders/15Faa78xlNAYLPvqS3cKM1v8bV1HQzW2W?usp=sharing

mynodes: see nodes.txt

with open("nodes.txt", "r") as file:  
   mynodes = ast.literal_eval(file.read) 

I am happy to provide more details if needed.

Upvotes: 2

Views: 143

Answers (2)

Alain T.
Alain T.

Reputation: 42139

If you're not using pandas, you could do this with Counter from collections:

from collections import Counter,defaultdict
from itertools import product
counts = Counter((c,k) for k,v in mydictionary.items() for t in v for c in t )
result = defaultdict(list)
for c,k in product(mynodes,mydictionary):
    result[c].append((k,counts[(c,k)]))

print(result)
{'E': [(0, 6), (1, 8), (2, 1), (3, 0)],
 'D': [(0, 4), (1, 8), (2, 3), (3, 0)],
 'G': [(0, 5), (1, 10), (2, 0), (3, 0)],
 'F': [(0, 5), (1, 10), (2, 0), (3, 0)],
 'B': [(0, 2), (1, 9), (2, 3), (3, 1)],
 'A': [(0, 4), (1, 9), (2, 1), (3, 1)],
 'C': [(0, 4), (1, 9), (2, 1), (3, 1)]}

Counter will manage counting instances for each combination of mydictionary key and node. You can then use these counts to create the expected output.

EDIT Expanded counts line:

counts = Counter()                          # initialize Counter() object
for key,tupleSet in mydictionary.items():   # loop through dictionary
    for tupl in tupleSet:                   # loop through tuple set of each key
        for node in tupl:                   # loop through node character in each tuple
            counts[(node,key]] += 1         # count 1 node/key pair

Upvotes: 1

BENY
BENY

Reputation: 323386

Since you tag pandas, first we need convert your dict to pandas dataframe , then we stack it , and using crosstab

s=pd.DataFrame.from_dict(mydictionary,'index').stack()


s = pd.DataFrame(s.values.tolist(), index=s.index).stack()
pd.crosstab(s.index.get_level_values(0),s)
col_0  A  B  C  D  E   F   G
row_0                       
0      4  2  4  4  6   5   5
1      9  9  9  8  8  10  10
2      1  3  1  3  1   0   0
3      1  1  1  0  0   0   0

Update

s=pd.crosstab(s.index.get_level_values(0), s).stack().reset_index()

s[['row_0',0]].apply(tuple,1).groupby(s['col_0']).agg(list).to_dict()

Upvotes: 1

Related Questions