techsmart
techsmart

Reputation: 105

Searching specific value while iterating through a list of list of strings to find matches inside a list of strings in Python

I have 2 lists.

paragraphs = [ 'The tablets are filled into cylindrically shaped bottles made of white coloured\npolyethylene. The volumes of the bottles depend on the tablet strength and amount of\ntablets, ranging from 20 to 175 ml. The screw type cap is made of white coloured\npolypropylene and is equipped with a tamper proof ring.', 'PVC/PVDC blister pack', 'Blisters are made in a cold-forming process from an aluminium base web. Each tablet is\nfilled into a separate blister and a lidding foil of aluminium is welded on. The blisters\nare opened by pressing the tablets through the lidding foil.', '\n']


final_ref = [['Blister', 'Foil', 'Aluminium'], ['Blister', 'Base Web', 'PVC/PVDC'], ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], ['Bottle', 'Screw Type Cap', 'Polypropylene'], ['Blister', 'Base Web', 'PVC'], ['Blister', 'Base Web', 'PVD/PVDC'], ['Bottle', 'Square Shaped Bottle', 'Polyethylene']]

The code written below separates each paragraph and extracts matches inside it. Code:

counter = 1
result=[]

for words in final_ref:
    for sen in paragraphs:
        all_exist = True
        for w in words:
            if w.lower() not in sen.lower():
                all_exist = False
                break
        if all_exist:
            #print(words[0])
            colours = ["White","Yellow","Blue","Red","Green","Black","Brown","Silver","Purple","Navy blue","Gray","Orange","Maroon","pink","colourless","blue"]
            if words[0] == 'Bottle':
                for wd in colours:
                    if wd in sen.split():
                        wd = wd

            fr = "Stage " + str(counter) + ": " + "Package Description" + ": " + sen + " Values" + ": " + str(words) + "Colour" + ": " + str(wd) + "\n" + "\n" + "\n"
            result.append(fr)
            result = [i.replace('\n','') for i in result]
            result = [i.replace('\t','') for i in result]
            counter += 1
print(result)

Now

Values: ['Blister', 'Foil', 'Aluminium']Colour: blue"] as output

which is coming under Stage 1 when you run this code is what I don't want. For the Stage 2 and Stage 3, colour value is fine

I want to check the values of words[0] and if 'Bottle' is present in it, I want to search for the colour in that string. If not found, I just want to return bottle without any colour values. This code extracts colour for every "Stage" which I don't want. Only if 'Bottle' is there in words[0] colour should be searched for. Any idea on how to solve this

Expected Output:

["Group 1: Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister', 'Foil', 'Aluminium'], 


"Group 2: Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene']Colour: white", 

"Group 2: Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene']Colour: white"]

Upvotes: 1

Views: 77

Answers (1)

Shipra
Shipra

Reputation: 1299

This is the more pythonic way to do it and produces expected output.

paragraphs = ['The tablets are filled into cylindrically shaped bottles made of white coloured\npolyethylene. The volumes of the bottles depend on the tablet strength and amount of\ntablets, ranging from 20 to 175 ml. The screw type cap is made of white coloured\npolypropylene and is equipped with a tamper proof ring.', 'PVC/PVDC blister pack', 'Blisters are made in a cold-forming process from an aluminium base web. Each tablet is\nfilled into a separate blister and a lidding foil of aluminium is welded on. The blisters\nare opened by pressing the tablets through the lidding foil.', '\n']

final_ref = [['Blister', 'Foil', 'Aluminium'], ['Blister', 'Base Web', 'PVC/PVDC'], ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], ['Bottle', 'Screw Type Cap', 'Polypropylene'], ['Blister', 'Base Web', 'PVC'], ['Blister', 'Base Web', 'PVD/PVDC'], ['Bottle', 'Square Shaped Bottle', 'Polyethylene']]

colours = ['White', 'Yellow', 'Blue', 'Red', 'Green', 'Black', 'Brown', 'Silver', 'Purple', 'Navy blue', 'Gray', 'Orange', 'Maroon', 'pink', 'colourless', 'blue']

TEXT_WITHOUT_COLOUR = 'Stage {counter} : Package Description: {sen} Values: {values}'

TEXT_WITH_COLOUR = TEXT_WITHOUT_COLOUR + ' Colour: {colour}'

counter = 1
result = []


def is_missing(words, sen):
    for w in words:
        if w.lower() not in sen.lower():
            return True
    return False


for words in final_ref:
    for sen in paragraphs:
        if is_missing(words, sen):
            continue

        kwargs = {
            'counter': counter,
            'sen': sen,
            'values': str(words)
        }

        if words[0] == 'Bottle':
            for wd in colours:
                if wd.lower() in sen.lower():
                    kwargs['colour'] = wd
                    break
            text_const = TEXT_WITH_COLOUR
        else:
            text_const = TEXT_WITHOUT_COLOUR

        result.append(text_const.format(**kwargs).replace('\n', '').replace('\t', ''))
        counter += 1

print(result)

Output:

["Stage 1 : Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister', 'Foil', 'Aluminium']",

"Stage 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'] Colour: White",

"Stage 3 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'] Colour: White"]

Hope this helps.

Upvotes: 1

Related Questions