wolverinejohn
wolverinejohn

Reputation: 29

For loop outputting duplicates

a = {'1330': ('John', 'Gold', '1330'), "0001":('Matt', 'Wade', '0001'), '2112': ('Bob', 'Smith', '2112')}
com = {'6':['John Gold, getting no points', 'Matt played in this game? Didn\'t notice him','Love this shot!']}
comments_table = []

What I am trying to achieve with this replacer function is replace people's names in the strings found in com(dict) with the a code unique to them which is found in a(dict) via regex. Replacing the name with the code works, but adding that new string with the code instead of the name is where I am going wrong.

def replace_first_name():
for k,v in a.items():
    for z, y in com.items():
        for item in y:
            firstname = a[k][0]
            lastname = a[k][1]
            full_name = firstname + ' ' + lastname
            if firstname in item:
                if full_name in item:
                    t = re.compile(re.escape(full_name), re.IGNORECASE)
                    comment = t.sub(a[k][2], item)
                    print ('1')
                    comments_table.append({
                        'post_id': z, 'comment': comment
                    })
                    continue

                else:

                    t = re.compile(re.escape(firstname), re.IGNORECASE)
                    comment = t.sub(a[k][2], item)
                    print ('2')
                    comments_table.append({
                        'post_id':z, 'comment':comment
                    })
            else:
                print ('3')
                if fuzz.ratio(item,item) > 90:
                    comments_table.append({
                        'post_id': z, 'comment': item
                    })
                else:
                    pass

The problem is with the output as seen below:

[{'comment': '1330, getting no points', 'post_id': '6'}, {'comment': "Matt played in this game? Didn't notice him", 'post_id': '6'}, {'comment': 'Love this shot!', 'post_id': '6'}, {'comment': 'John Gold, getting no points', 'post_id': '6'}, {'comment': "Matt played in this game? Didn't notice him", 'post_id': '6'}, {'comment': 'Love this shot!', 'post_id': '6'}, {'comment': 'John Gold, getting no points', 'post_id': '6'}, {'comment': "0001 played in this game? Didn't notice him", 'post_id': '6'}, {'comment': 'Love this shot!', 'post_id': '6'}]

I don't want comments that already have their name replaced with the number to make their way into the final list. Therefore, I want my expected output to look like this:

[{'comment': '1330, getting no points', 'post_id': '6'},{'comment': '0001,played in this game? Didn\'t notice him', 'post_id': '6', {'comment':'Love this shot', 'post_id':'6'}]

I have looked into using an iterator by making y an iter_list, but I didn't get anywhere. Any help would be appreciated. Thanks!

Upvotes: 0

Views: 63

Answers (1)

Anders Sandvig
Anders Sandvig

Reputation: 20986

Not sure why you are doing the regexp replace since you are checking if the first name/full name is present with in. Also not sure what the fuzz.ratio(item, item) thing in case 3 is supposed to do, but here's how you can do the simple/naive replacement:

#!/usr/bin/python
import re

def replace_names(authors, com):
    res = []
    for post_id, comments in com.items():
        for comment in comments:
            for author_id, author in authors.items():
                first_name, last_name = author[0], author[1]
                full_name = first_name + ' ' + last_name
                if full_name in comment:
                    comment = comment.replace(full_name, author_id)
                    break
                elif first_name in comment:
                    comment = comment.replace(first_name, author_id)
                    break
            res.append({'post_id': post_id, 'comment': comment})
    return res

a = {'1330': ('John', 'Gold', '1330'), "0001":('Matt', 'Wade', '0001'), '2112': ('Bob', 'Smith', '2112')}
com = {'6':['John Gold, getting no points', 'Matt played in this game? Didn\'t notice him','Love this shot!']}
for comment in replace_names(a, com):
    print comment

Which produces this output:

{'comment': '1330, getting no points', 'post_id': '6'}
{'comment': "0001 played in this game? Didn't notice him", 'post_id': '6'}
{'comment': 'Love this shot!', 'post_id': '6'}

It's a bit tricky to understand what your intention is with the original code, but (one of) the reason(s) you are getting duplicates is that you are processing authors in the outher loop, which means you will process each comment one time for each author. By swapping the loop you ensure that each comment is processed only once.

You may also have intended to have a break where you have the continue, but I'm not entirely sure I understand how your original code is supposed to work.

The use of global variables is also a bit confusing.

Upvotes: 2

Related Questions