techsmart
techsmart

Reputation: 105

Counter inside for loop adding duplicates in output, remove duplicate entries in output in Python

I want to remove duplicate entries from my function output.

I have a function which searches for relationships inside each paragraph of a text file. A sample of the metadata.csv file which searches for relationship in paragraphs is as follows:

Blister     Base Web    PVC/PVDC
Blister     Foil         Aluminium
Blister     Base Web    PVC/PVDC
Blister     Foil         Aluminium
Vial        Glass       Borosilicate Glass
Vial        Stopper     Bromobutyl Rubber
Vial        Cap         Aluminium

The sample text file is as follows:

The tablets are filled into cylindrically shaped bottles made of white coloured
polyethylene. The volumes of the bottles depend on the tablet strength and amount of
tablets, ranging from 20 to 175 ml. The screw type cap is made of white coloured
polypropylene and is equipped with a tamper proof ring.

PVC/PVDC blister pack

Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tablet
is filled into a separate blister and a lidding foil of aluminium is welded on. The blisters
are opened by pressing the tablets through the lidding foil. PVDC foil is in contact with
the tablets.

Aluminium blister pack

Blisters are made in a cold-forming process from an aluminium base web. Each tablet is
filled into a separate blister and a lidding foil of aluminium is welded on. The blisters
are opened by pressing the tablets through the lidding foil.

So there will be 3 groups for this text file.

The function is as follows:

import csv
import re
import os         
def extractor(filepath):
    #pdb.set_trace()    
    TEXT_WITHOUT_COLOUR = 'Stage {counter} : Package Description: {sen} Values: {values}'
    TEXT_WITH_COLOUR = TEXT_WITHOUT_COLOUR + ','  ' Colour: {colour}'
    colours = ['White', 'Yellow', 'Blue', 'Red', 'Green', 'Black', 'Brown', 'Silver', 'Purple', 'Navy blue', 'Gray', 'Orange', 'Maroon', 'pink', 'colourless', 'blue']
    counter = 1
    result = [] 
    unique_desc = [] #every unique description is stored 
    output      = [] 
    with open(filepath, encoding='utf-8') as f:
        data=f.read()
        paragraphs=data.split("\n\n")
    inputfile = r"C:\metadata.csv"                
    inputm = []

    with open(inputfile, "r") as f:
        reader = csv.reader(f, delimiter="\t")
        for row in reader:
            #types = row.split(',')
            inputm.append(row)

    final_ref = [] 
    for lists in inputm:
        final_ref.append(str(lists[0]).split(','))
    def is_missing(words, sen):
        for w in words:
            if w.lower() not in sen.lower():
                return True
        return False




    for sen in paragraphs:
        for words in final_ref:
            if is_missing(words, sen):
                continue

            kwargs = {
                'counter': counter,
                'sen': sen,
                'values': str(words)
            }

            if (words[0] == 'Bottle') or (words[0]=='Vial') or (words[0] =='Container') or (words[0] =='Ampoules') or (words[0] =='Occlusive Dressing'):
                for wd in colours:
                    if wd.lower() in sen.lower():
                        kwargs['colour'] = wd
                        break
                text_const = TEXT_WITH_COLOUR
            else:
                text_const = TEXT_WITHOUT_COLOUR

            result.append(text_const.format(**kwargs).replace('\n', '').replace('\t', ''))




            for desc in result:

                compare = re.search(r'Package Description:(.*?)Values:',desc).group(1).replace(' ','') #clean spaces

                if compare in unique_desc:  

                    group = str(unique_desc.index(compare)+1) #index starts in 0 and group in 1     
                    desc = re.sub('Stage \d','Group '+group, desc)
                    output.append(desc)

                else: 

                    unique_desc.append(compare)     
                    group = str(len(unique_desc))    #new group

                    desc = re.sub('Stage \d','Group '+group, desc)
                    output.append(desc)
                    counter+=1
                    break








    return output      

which returns as output

["Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Tamper Proof Ring', ''], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Tamper Proof Ring', ''], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cap', 'Polypropylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Tamper Proof Ring', ''], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cap', 'Polypropylene'], Colour: White",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Foil', 'Aluminium']",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Tamper Proof Ring', ''], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cap', 'Polypropylene'], Colour: White",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Foil', 'Aluminium']",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC/PVDC']",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Tamper Proof Ring', ''], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cap', 'Polypropylene'], Colour: White",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Foil', 'Aluminium']",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC/PVDC']",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC']",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Tamper Proof Ring', ''], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cap', 'Polypropylene'], Colour: White",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Foil', 'Aluminium']",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC/PVDC']",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC']",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVDC']",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Tamper Proof Ring', ''], Colour: White",
 "Group 1 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cap', 'Polypropylene'], Colour: White",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Foil', 'Aluminium']",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC/PVDC']",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC']",
 "Group 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVDC']",
 "Group 3 : Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister', 'Foil', 'Aluminium']"]

The ones with the exact same entries of 'Package Description and Values' should be ignored from the output. I think there is an issue with my break statement and the counter variable. Any help regarding this.

Upvotes: 0

Views: 37

Answers (1)

S.N
S.N

Reputation: 5140

If you prefer, you can achieve this with the help of set. But please note that order is not preserved though you can achieve uniqueness.

lst = ['foo', 'bar', 'foo', 'bar']
s = set(lst)
print(s) # this will print foo and bar

Upvotes: 1

Related Questions