Reputation: 5746
I got a waveform object, define as following:
class wfm:
"""Class defining a waveform characterized by:
- A name
- An electrode configuration
- An amplitude (mA)
- A pulse width (microseconds)"""
def __init__(self, name, config, amp, width=300):
self.name = name
self.config = config
self.amp = amp
self.width = width
def __eq__(self, other):
return type(other) is self.__class__ and other.name == self.name and other.config == self.config and other.amp == self.amp and other.width == self.width
def __ne__(self, other):
return not self.__eq__(other)
Through parsing, I get a list called waveforms with 770 instance of wfm in it. There is a lot of duplicate, and I need to delete them.
My idea was to get the ID of equivalent object, store the largest ID in a list, and then loop on all the waveforms from the end while popping out each duplicate.
Code:
duplicate_ID = []
for i in range(len(waveforms)):
for j in range(i+1, len(waveforms)):
if waveforms[i] == waveforms[j]:
duplicate_ID.append(waveforms.index(waveforms[j]))
print ('{} eq {}'.format(i, j))
duplicate_ID = list(set(duplicate_ID)) # If I don't do that; 17k IDs
Turns out (thx to the print) that I have duplicates that don't appear into the ID list, for instance 750 is a duplicate of 763 (print says it; test too) and yet none of this 2 IDs appears in my duplicate list.
I'm quite sure there is a better solution that this method (which doesn't yet work), and I would be glad to hear it. Thanks for the help!
EDIT: More complicated scenario
I've got a more complicated scenario. I got 2 classes, wfm (see above) and stim:
class stim:
"""Class defining the waveform used for a stimultion by:
- Duration (milliseconds)
- Frequence Hz)
- Pattern of waveforms"""
def __init__(self, dur, f, pattern):
self.duration = dur
self.fq = f
self.pattern = pattern
def __eq__(self, other):
return type(other) is self.__class__ and other.duration == self.duration and other.fq == self.fq and other.pattern == self.pattern
def __ne__(self, other):
return not self.__eq__(other)
I parse my files to fill a dict: paradigm. It looks like that:
paradigm[file name STR] = (list of waveforms, list of stimulations)
# example:
paradigm('myfile.xml') = ([wfm1, ..., wfm10], [stim1, ..., stim5])
Once again, I want to delete the duplicates, i.e. I want to keep only the data where:
Example:
file1 has 10 waveforms and file2 has the same 10 waveforms.
file1 has stim1 and stim2 ; file2 has stim3, sitm 4 and stim 5.
stim1 and stim3 are the same; so since the waveforms are also the same, I want to keep:
file1: 10 waveforms and stim1 and stim2
file2: 10 waveforms and stim 4 and stim5
The correlation is kinda messy in my head, so I got a few difficulties finding the right storage solution for wave forms and stimulation in order to compare them easly. If you got any idea, I'd be glad to hear it. Thanks!
Upvotes: 1
Views: 80
Reputation: 19104
The .index
method uses the .__eq__
method you overloaded. So
waveforms.index(waveforms[j])
will always find the first instance of the waveform in the list containing the same attributes as waveforms[j]
.
w1 = wfm('a', {'test_param': 4}, 3, 2.0)
w2 = wfm('b', {'test_param': 4}, 3, 2.0)
w3 = wfm('a', {'test_param': 4}, 3, 2.0)
w1 == w3 # True
w2 == w3 # False
waveforms = [w1, w2, w3]
waveforms.index(waveforms[2]) == waveforms.index(waveforms[0]) == 0 # True
You don't need to store list indices if you do this immutably:
key = lambda w: hash(str(vars(w)))
dupes = set()
unique = [dupes.add(key(w)) or w for w in waveforms if key(w) not in dupes]
unique == [w1, w2] # True
key = lambda w: hash(str(vars(w)))
seen = set()
idxs = [i if key(w) in seen else seen.add(key(w)) for i, w in enumerate(waveforms)]
for idx in filter(None, idxs[::-1]):
waveforms.pop(idx)
waveforms == [w1, w2] # True
It's a good habit to think about big O complexity when writing algorithms (though optimization should come at the cost of readability only when needed). In this case, these solutions are a bit more readable and also big O optimal.
Your initial solution was O(n^2) due to the double for loop.
Both solutions provided are O(n).
Upvotes: 1