From a set of given records containing contact information, find the duplicate records and merge them if different contacts exists else deprecate the duplicate. The record is in the format: record id first_name second_name contact Example: 001 Ram Sharma ram@gmail.com 002 Jai Kishor 9997125640 003 Ram Sharma ram@gmail.com 004 Krishna Gupta ksh@yahoo.com 005 Ram Sharma ram@gmail.com 006 Jai Kishor 1276594888 007 Ram Sharma ram-new@gmail.com Output: 001 Ram Sharma ram@gmail.com, ram-new@gmail.com 002 Jai Kishor 9997125640, 1276594888 004 Krishna Gupta ksh@yahoo.com Please consider if any mistakes as I am new to this platform.

Reputation: 13

Find the duplicate in Python with below data

From a set of given records containing contact information, find the duplicate records and merge them if different contacts exists else deprecate the duplicate. The record is in the format:

record id first_name second_name contact

Example:

001 Ram Sharma [email protected]
002 Jai Kishor 9997125640
003 Ram Sharma [email protected]
004 Krishna Gupta [email protected]
005 Ram Sharma [email protected]
006 Jai Kishor 1276594888
007 Ram Sharma [email protected]

Output:

001 Ram Sharma [email protected], [email protected] 002 Jai Kishor 9997125640, 1276594888 004 Krishna Gupta [email protected]

Please consider if any mistakes as I am new to this platform.

Upvotes: 1

Answers (3)

Amir saleem

Reputation: 1496

Code:

raw_data = """001 Ram Sharma [email protected]
002 Jai Kishor 9997125640
003 Ram Sharma [email protected]
004 Krishna Gupta [email protected]
005 Ram Sharma [email protected]
006 Jai Kishor 1276594888
007 Ram Sharma [email protected]"""


def normalize(data):
    dataset = [(data.split()[0],' '.join(data.split()[1:3]),' '.join(data.split()[3:]))  for data in raw_data.split('\n')]
    tempdict = {}
    for field in dataset:
        if field[1] in tempdict:
            if field[2] in tempdict[field[1]]:
                continue
            tempdict[field[1]] += (", " + field[2])
        else:
            tempdict[field[1]] = ' '.join(field)
    return tempdict


if __name__ == '__main__':
    new_data = normalize(data=raw_data)
    for value in new_data.values():
       print(value)

OUTPUT

001 Ram Sharma [email protected], [email protected]
002 Jai Kishor 9997125640, 1276594888
004 Krishna Gupta [email protected]

Upvotes: 1

Rob Py

Reputation: 156

You can use dictionary and assign it with key (with the record id), value (a tuple with the first_name, second_name, and list of contacts):

data = """001 Ram Sharma [email protected]
    002 Jai Kishor 9997125640
    003 Ram Sharma [email protected]
    004 Krishna Gupta [email protected]
    005 Ram Sharma [email protected]
    006 Jai Kishor 1276594888
    007 Ram Sharma [email protected]"""

data = data.split("\n") # split by newline

aDict ={}
for item in data:
    rkey,s2 = item.split(" ", 1)
    fname,s3 = s2.split(" ", 1)
    lname,cntct = s3.split(" ", 1)
    aDict[rkey] = (fname,lname,[cntct,])

print('REMOVE DUPLICATE Name,Contact')
aDict2 ={}
for k in aDict:
    if any(aDict[k][2] == aDict[y][2] for y in aDict2):
        pass #Do nothing
    else:
        aDict2[k] = aDict[k]  #add to new Dict
        
print('MERGE DUPLICATE Contacts')
aDict2mrg ={}
for ihc in aDict2:
    xtemp = None
    for x in aDict2mrg:
        if (aDict2[ihc][0],aDict2[ihc][1]) == (aDict2mrg[x][0],aDict2mrg[x][1]):
            #print(aDict2[ihc][2] , aDict2mrg[x][2])
            for z in aDict2[ihc][2]:
                if z not in aDict2mrg[x][2]:
                    xtemp = x   # assign the different dict key to temp value and break out of loop
                    break
        
    if xtemp is None:
        aDict2mrg[ihc] = (aDict2[ihc][0],aDict2[ihc][1],[aDict2[ihc][2][0],])
    else:
        value_temp_cntcts = aDict2mrg[xtemp][2]
        value_temp_cntcts.extend(aDict2[ihc][2]) # assign the different contact to preceding values
        value2 = (aDict2mrg[xtemp][0],aDict2mrg[xtemp][1],value_temp_cntcts)
        aDict2mrg[xtemp] = value2  # assign the changed values to the same dict key
        
print('Show the Name and all the DIFFERENT Contacts in same record')
for m in aDict2mrg:
    print(m,aDict2mrg[m])

Output of dict printed as key, value

Show the Name and all the DIFFERENT Contacts in same record
001 ('Ram', 'Sharma', ['[email protected]', '[email protected]'])
002 ('Jai', 'Kishor', ['9997125640', '1276594888'])
004 ('Krishna', 'Gupta', ['[email protected]'])

Upvotes: 0

SollyBunny

Reputation: 846

Assuming you have the input as raw text, you can use a set to combine any exact duplicates

data = """001 Ram Sharma [email protected]
002 Jai Kishor 9997125640
003 Ram Sharma [email protected]
004 Krishna Gupta [email protected]
005 Ram Sharma [email protected]
006 Jai Kishor 1276594888
007 Ram Sharma [email protected]"""

data = data.split("\n") # split by newline
data = set(data) # remove any duplicates
data = sorted(data) # sort them (just in case) and turn it back to list
output = "\n".join(data) # join the data back together
print(output) # print output

Upvotes: 0

Find the duplicate in Python with below data

Answers (3)

Related Questions