techsmart
techsmart

Reputation: 105

Group by values of string and keeping a count on the number of groups, in Python

I have a piece of code:

paragraphs = ['The tablets are filled into cylindrically shaped bottles made of white coloured\npolyethylene. The volumes of the bottles depend on the tablet strength and amount of\ntablets, ranging from 20 to 175 ml. The screw type cap is made of white coloured\npolypropylene and is equipped with a tamper proof ring.', 'PVC/PVDC blister pack', 'Blisters are made in a cold-forming process from an aluminium base web. Each tablet is\nfilled into a separate blister and a lidding foil of aluminium is welded on. The blisters\nare opened by pressing the tablets through the lidding foil.', '\n']

final_ref = [['Blister', 'Foil', 'Aluminium'], ['Blister', 'Base Web', 'PVC/PVDC'], ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], ['Bottle', 'Screw Type Cap', 'Polypropylene'], ['Blister', 'Base Web', 'PVC'], ['Blister', 'Base Web', 'PVD/PVDC'], ['Bottle', 'Square Shaped Bottle', 'Polyethylene']]

colours = ['White', 'Yellow', 'Blue', 'Red', 'Green', 'Black', 'Brown', 'Silver', 'Purple', 'Navy blue', 'Gray', 'Orange', 'Maroon', 'pink', 'colourless', 'blue']

TEXT_WITHOUT_COLOUR = 'Stage {counter} : Package Description: {sen} Values: {values}'

TEXT_WITH_COLOUR = TEXT_WITHOUT_COLOUR + ' Colour: {colour}'

counter = 1
result = []


def is_missing(words, sen):
    for w in words:
        if w.lower() not in sen.lower():
            return True
    return False


for words in final_ref:
    for sen in paragraphs:
        if is_missing(words, sen):
            continue

        kwargs = {
            'counter': counter,
            'sen': sen,
            'values': str(words)
        }

        if words[0] == 'Bottle':
            for wd in colours:
                if wd.lower() in sen.lower():
                    kwargs['colour'] = wd
                    break
            text_const = TEXT_WITH_COLOUR
        else:
            text_const = TEXT_WITHOUT_COLOUR

        result.append(text_const.format(**kwargs).replace('\n', '').replace('\t', ''))
        counter += 1

print(result)

which returns output as:

["Stage 1 : Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister', 'Foil', 'Aluminium']",

"Stage 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'] Colour: White",

"Stage 3 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'] Colour: White"]

What I want to do is to check the content of the 'Package Description' and if it is the same, I want to group all the different 'Values' under the same Group Number

So, I want the output to come in the following format:

["Group 1: Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister', 'Foil', 'Aluminium'], 


"Group 2: Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene']Colour: white", 

"Group 2: Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene']Colour: white"]

TEST SAMPLE:

ls = ["Stage 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Foil', 'Aluminium']",
     "Stage 2 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC/PVDC']",
     "Stage 3 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",
     "Stage 4 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'], Colour: White",
     "Stage 5 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC']",
     "Stage 6 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Tamper Proof Ring', ''], Colour: White",
     "Stage 7 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVDC']",
     "Stage 8 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cap', 'Polypropylene'], Colour: White"]

Expected output for this sample will be:

["Group 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Foil', 'Aluminium']",
"Group 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC']"
 "Group 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC/PVDC']",
"Group 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVDC']",

 "Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",

 "Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'], Colour: White",

 "Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Tamper Proof Ring', ''], Colour: White",

 "Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cap', 'Polypropylene'], Colour: White"]

Upvotes: 0

Views: 68

Answers (1)

Onyambu
Onyambu

Reputation: 79378

Let ls be your output above, then

import re
from itertools import groupby
fun = lambda x: re.search("Package Description:.*?:",x).group()
sum([re.sub("Stage \\d",f"Group {i+1}","SPLIT".join(k)).split("SPLIT") for i,(_,k) in enumerate(groupby(sorted(ls,key = fun),key = fun))],[])


["Group 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Foil', 'Aluminium']",
 "Group 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC/PVDC']",
 "Group 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC']",
 "Group 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVDC']",
 "Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",
 "Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'], Colour: White",
 "Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Tamper Proof Ring', ''], Colour: White",
 "Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cap', 'Polypropylene'], Colour: White"]

writting that in a more neat way:

import re
from itertools import groupby
fun = lambda x: re.search("Package Description:.*?:",x).group()
a = []
for i,(_, k) in enumerate(groupby(sorted(ls,key = fun),key = fun)):
    a += (re.sub("Stage \\d",f"Group {i+1}","SPLIT".join(k)).split("SPLIT"))

a

["Group 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Foil', 'Aluminium']",
 "Group 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC/PVDC']",
 "Group 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVC']",
 "Group 1 : Package Description: Blisters are made in a thermo-forming process from a PVC/PVDC base web. Each tabletis filled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. PVDC foil is in contact withthe tablets. Values: ['Blister', 'Base Web', 'PVDC']",
 "Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], Colour: White",
 "Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'], Colour: White",
 "Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Tamper Proof Ring', ''], Colour: White",
 "Group 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cap', 'Polypropylene'], Colour: White"]

Upvotes: 1

Related Questions