Reputation: 22933
I have a set of around 6 000 packets which for comparison purposes I represent as strings (first 28 bytes) to compare against just as many packets, which I also represent as strings of 28 bytes.
I have to match each packet of one set with all of the other. Matchings are always unique.
I found that comparing strings takes a bit of time. Is there any way to speed up the process?
EDIT1: I wouldn't like to permutate string elements because I am always making sure that ordering between packet list and corresponding string list is preserved.
EDIT2: Here's my implementation:
list1, list2 # list of packets (no duplicates present in each list!)
listOfStrings1, listOfStrings2 # corresponding list of strings. Ordering is preserved.
alreadyMatchedlist2Indices = []
for list1Index in xrange(len(listOfStrings1)):
stringToMatch = listOfStrings1[list1Index]
matchinglist2Indices = [i for i, list2Str in enumerate(listOfStrings2)
if list2Str == stringToMatch and i not in alreadyMatchedlist2Indices]
if not matchinglist2Indices:
tmpUnmatched.append(list1Index)
elif len(matchinglist2Indices) == 1:
tmpMatched.append([list1Index, matchinglist2Indices[0]])
alreadyMatchedlist2Indices.append(matchinglist2Indices[0])
else:
list2Index = matchinglist2Indices[0] #taking first matching element anyway
tmpMatched.append([list1Index, list2Index])
alreadyMatchedlist2Indices.append(list2Index)
Upvotes: 2
Views: 3003
Reputation: 552
what's wrong with:
matches = [packet for packet in list1 if packet in list2]
Upvotes: 0
Reputation: 151157
Here's a simple linear time approach -- at least if I understand your question correctly:
>>> def get_matches(a, b):
... reverse_map = {x:i for i, x in enumerate(b)}
... return [(i, reverse_map[x]) for i, x in enumerate(a) if x in reverse_map]
...
>>> get_matches(['a', 'b', 'c'], ['c', 'd', 'e'])
[(2, 0)]
This accepts two sequences of strings, a
and b
, and returns a list of matches represented as tuples of indices into a
and b
. This is O(n + m) where m and n are the lengths of a
and b
.
Upvotes: 4
Reputation: 25011
---Here I'm assuming you're taking every strings one by one and comparing to all others.---
I suggest sorting your list of string and comparing neighboring strings. This should have a runtime of O(nlogn).
Upvotes: 5