Reputation: 423
One of the most frustrating things about learning a new language is, well that you don't know how to do anything. I want to perform, what should be a simple task, but I am struggling to implement it.
I want to keep track of items that I have already traversed, along with their location. I ant to be able to look back into the collection when I find an item, see if it's already been seen, and if so, what was it's location (line number). I then want to look at the last item found before the current one and review it's name and location.
I am parsing some unstructured text, and sometimes I match on unintended parts of word sections.
Take the following:
'Item 1', 150
'Item 2', 340
'Item 3', 794
'Item 4', 1205
'Item 5', 1869
'Item 2', 3412 <-- I've seen 2, So I want to inspect the item before it (5, 1869)
My idea is to test the distance between 2 and 5 and make a determination on if it's noise. In this scenario, I would want to drop (Item 2, 3412) because 2 should come before 5 AND line 3412 is such a long distance away from the previous 2 (line 340), and there is also sequential items between the "last seen" item and this one.
Of course, if anyone has a better idea, I am all for that as well.
I have no idea how to walk a collection in python. I'm not even sure what type of collection I should be using. I seem to be favoring lists of paired tuples at the moment, but that's probably just me being silly.
Any guidance is appreciated.
for line_num, line in enumerate(all_lines):
# matching requires back-tracking - we will always be at least 1 line behind loop
if line_num < 1: continue
blob = ''.join(all_lines[line_num : line_num + _blob_length_])
# evaluate text aginst match expressions
matches = self.match_patterns_sb(blob) if is_sb_edition else self.match_patterns(blob)
#iterate each pattern and test if match was successful
for pattern in matches.iterkeys():
if matches[pattern] and line_num >= last_line_matched + 1: #Try not to rematch
if pattern == last_matched_pattern and line_num < (last_line_matched + 2) :continue
#store match info in a local tuple nested within a higher level list
if not '(continued)' in blob.lower() and not '( continued )' in blob.lower():
print '{0} - {1}'.format(pattern, line_num)
'''
At this point I want to look into last_seen, and
1) Get the last seen item that matches this one ('Item 2')
2) Get the last item added into last_seen
3) do some calculations
'''
last_seen[pattern] = line_num
if pattern in dict(section_items).keys():
test = dict(section_items)
existing_line = test[pattern]
print '{0} exists with LINE NUMBER {1}'.format(pattern, existing_line)
section_items.append( (pattern, line_num) )
# track last match
last_line_matched = line_num
last_matched_pattern = pattern
# order and normalize the item matches
fixed_list = OrderedDict(self.sorted_nicely(section_items, itemgetter(0))).items()
Upvotes: 0
Views: 97
Reputation: 25974
Accumulate things as you go using a dict
, checking if you've seen it before each time.
sequence = [('item 1',150),('item 2',340),('item 3',794),('item 4',1205,),('item 5',1869),('item 2',3412)]
d = {}
for i,tup in enumerate(sequence):
item,val = tup
if d.get(item):
print("I've seen {} before, it was {} at index {}".format(item,*d.get(item)))
d[item] = (val, i)
#I've seen item 2 before, it was 340 at index 1
d
will always have the last time you've seen item
, or None
.
If you need to keep track of all the times you've seen item
in the past, move up to a defaultdict
to accumulate (item, i)
tuples into a list
for you.
Upvotes: 3