Removing duplicate rows

Question

[Similar Post][1]

I have a tab-delimited spreadsheet and I'm trying to figure out a way to remove duplicate entries. Here's some made-up data that has the same form as the data in the spreadsheet:

name    phone   email   website 
Diane Grant Albrecht M.S.           
"Lannister G. Cersei M.A.T., CEP"   111-222-3333    cersei@got.com  www.got.com
Argle D. Bargle Ed.M.           
Sam D. Man Ed.M.    000-000-1111    dman123@gmail.com   www.daManWithThePlan.com
Sam D. Man Ed.M.    
Sam D. Man Ed.M.    111-222-333     dman123@gmail.com   www.daManWithThePlan.com
D G Bamf M.S.           
Amy Tramy Lamy Ph.D.

I would like to have the duplicate rows for Sam D. Man merged into one that keeps the two phone numbers but doesn't store two of the same email and two of the same website.

The way I thought about doing this was to store the previous row and compare the names. If the names match, then compare the phone numbers. If the phone numbers don't match, append to the first row. Then compare the emails. If the emails don't match, append to the first row. And then compare the websites. If the websites don't match, then append the second website to the first. Then delete the second row.

I don't know how to delete a row. The other posts seem to avoid actually deleting rows by writing rows to a new file. But I think this is problematic for my case, because I do not want to write the rows with the same names twice.
Is there a more efficient means to loop through? Nested for loops are taking a while.
1. And I can see myself running into issues with indexing over the limit...

Here's my code:

with(open('ieca_first_col_fake_text.txt', 'rU')) as f:
    sheet = csv.DictReader(f, delimiter = '	')

# This function takes a tab-delim csv and merges the ones with the same name but different phone / email / websites.
def merge_duplicates(sheet):

    # Since duplicates immediately follow, store adjacent and compare. If the same name, append phone number 
    for row in sheet:
        for other_row in sheet:
            if row['name'] == other_row['name']:
                if row['email'] != other_row['email']:
                    row['email'].append(other_row['email'])
                if row['website'] != other_row['website']:
                    row['website'].append(other_row['website'])

    # code to remove duplicate row
    # delete.() or something...

merge_duplicates(sheet)

Removing duplicate rows

Answers (1)

Related Questions