Reputation: 13
From a set of given records containing contact information, find the duplicate records and merge them if different contacts exists else deprecate the duplicate. The record is in the format:
record id first_name second_name contact
Example:
001 Ram Sharma [email protected]
002 Jai Kishor 9997125640
003 Ram Sharma [email protected]
004 Krishna Gupta [email protected]
005 Ram Sharma [email protected]
006 Jai Kishor 1276594888
007 Ram Sharma [email protected]
Output:
001 Ram Sharma [email protected], [email protected] 002 Jai Kishor 9997125640, 1276594888 004 Krishna Gupta [email protected]
Please consider if any mistakes as I am new to this platform.
Upvotes: 1
Views: 128
Reputation: 1496
Code:
raw_data = """001 Ram Sharma [email protected]
002 Jai Kishor 9997125640
003 Ram Sharma [email protected]
004 Krishna Gupta [email protected]
005 Ram Sharma [email protected]
006 Jai Kishor 1276594888
007 Ram Sharma [email protected]"""
def normalize(data):
dataset = [(data.split()[0],' '.join(data.split()[1:3]),' '.join(data.split()[3:])) for data in raw_data.split('\n')]
tempdict = {}
for field in dataset:
if field[1] in tempdict:
if field[2] in tempdict[field[1]]:
continue
tempdict[field[1]] += (", " + field[2])
else:
tempdict[field[1]] = ' '.join(field)
return tempdict
if __name__ == '__main__':
new_data = normalize(data=raw_data)
for value in new_data.values():
print(value)
OUTPUT
001 Ram Sharma [email protected], [email protected]
002 Jai Kishor 9997125640, 1276594888
004 Krishna Gupta [email protected]
Upvotes: 1
Reputation: 156
You can use dictionary and assign it with key (with the record id), value (a tuple with the first_name, second_name, and list of contacts):
data = """001 Ram Sharma [email protected]
002 Jai Kishor 9997125640
003 Ram Sharma [email protected]
004 Krishna Gupta [email protected]
005 Ram Sharma [email protected]
006 Jai Kishor 1276594888
007 Ram Sharma [email protected]"""
data = data.split("\n") # split by newline
aDict ={}
for item in data:
rkey,s2 = item.split(" ", 1)
fname,s3 = s2.split(" ", 1)
lname,cntct = s3.split(" ", 1)
aDict[rkey] = (fname,lname,[cntct,])
print('REMOVE DUPLICATE Name,Contact')
aDict2 ={}
for k in aDict:
if any(aDict[k][2] == aDict[y][2] for y in aDict2):
pass #Do nothing
else:
aDict2[k] = aDict[k] #add to new Dict
print('MERGE DUPLICATE Contacts')
aDict2mrg ={}
for ihc in aDict2:
xtemp = None
for x in aDict2mrg:
if (aDict2[ihc][0],aDict2[ihc][1]) == (aDict2mrg[x][0],aDict2mrg[x][1]):
#print(aDict2[ihc][2] , aDict2mrg[x][2])
for z in aDict2[ihc][2]:
if z not in aDict2mrg[x][2]:
xtemp = x # assign the different dict key to temp value and break out of loop
break
if xtemp is None:
aDict2mrg[ihc] = (aDict2[ihc][0],aDict2[ihc][1],[aDict2[ihc][2][0],])
else:
value_temp_cntcts = aDict2mrg[xtemp][2]
value_temp_cntcts.extend(aDict2[ihc][2]) # assign the different contact to preceding values
value2 = (aDict2mrg[xtemp][0],aDict2mrg[xtemp][1],value_temp_cntcts)
aDict2mrg[xtemp] = value2 # assign the changed values to the same dict key
print('Show the Name and all the DIFFERENT Contacts in same record')
for m in aDict2mrg:
print(m,aDict2mrg[m])
Output of dict printed as key, value
Show the Name and all the DIFFERENT Contacts in same record
001 ('Ram', 'Sharma', ['[email protected]', '[email protected]'])
002 ('Jai', 'Kishor', ['9997125640', '1276594888'])
004 ('Krishna', 'Gupta', ['[email protected]'])
Upvotes: 0
Reputation: 846
Assuming you have the input as raw text, you can use a set to combine any exact duplicates
data = """001 Ram Sharma [email protected]
002 Jai Kishor 9997125640
003 Ram Sharma [email protected]
004 Krishna Gupta [email protected]
005 Ram Sharma [email protected]
006 Jai Kishor 1276594888
007 Ram Sharma [email protected]"""
data = data.split("\n") # split by newline
data = set(data) # remove any duplicates
data = sorted(data) # sort them (just in case) and turn it back to list
output = "\n".join(data) # join the data back together
print(output) # print output
Upvotes: 0