Reputation: 105
I have 2 lists.
paragraphs = [ 'The tablets are filled into cylindrically shaped bottles made of white coloured\npolyethylene. The volumes of the bottles depend on the tablet strength and amount of\ntablets, ranging from 20 to 175 ml. The screw type cap is made of white coloured\npolypropylene and is equipped with a tamper proof ring.', 'PVC/PVDC blister pack', 'Blisters are made in a cold-forming process from an aluminium base web. Each tablet is\nfilled into a separate blister and a lidding foil of aluminium is welded on. The blisters\nare opened by pressing the tablets through the lidding foil.', '\n']
final_ref = [['Blister', 'Foil', 'Aluminium'], ['Blister', 'Base Web', 'PVC/PVDC'], ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], ['Bottle', 'Screw Type Cap', 'Polypropylene'], ['Blister', 'Base Web', 'PVC'], ['Blister', 'Base Web', 'PVD/PVDC'], ['Bottle', 'Square Shaped Bottle', 'Polyethylene']]
The code written below separates each paragraph and extracts matches inside it. Code:
counter = 1
result=[]
for words in final_ref:
for sen in paragraphs:
all_exist = True
for w in words:
if w.lower() not in sen.lower():
all_exist = False
break
if all_exist:
#print(words[0])
colours = ["White","Yellow","Blue","Red","Green","Black","Brown","Silver","Purple","Navy blue","Gray","Orange","Maroon","pink","colourless","blue"]
if words[0] == 'Bottle':
for wd in colours:
if wd in sen.split():
wd = wd
fr = "Stage " + str(counter) + ": " + "Package Description" + ": " + sen + " Values" + ": " + str(words) + "Colour" + ": " + str(wd) + "\n" + "\n" + "\n"
result.append(fr)
result = [i.replace('\n','') for i in result]
result = [i.replace('\t','') for i in result]
counter += 1
print(result)
Now
Values: ['Blister', 'Foil', 'Aluminium']Colour: blue"] as output
which is coming under Stage 1 when you run this code is what I don't want. For the Stage 2 and Stage 3, colour value is fine
I want to check the values of words[0] and if 'Bottle' is present in it, I want to search for the colour in that string. If not found, I just want to return bottle without any colour values. This code extracts colour for every "Stage" which I don't want. Only if 'Bottle' is there in words[0] colour should be searched for. Any idea on how to solve this
Expected Output:
["Group 1: Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister', 'Foil', 'Aluminium'],
"Group 2: Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene']Colour: white",
"Group 2: Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene']Colour: white"]
Upvotes: 1
Views: 77
Reputation: 1299
This is the more pythonic way to do it and produces expected output.
paragraphs = ['The tablets are filled into cylindrically shaped bottles made of white coloured\npolyethylene. The volumes of the bottles depend on the tablet strength and amount of\ntablets, ranging from 20 to 175 ml. The screw type cap is made of white coloured\npolypropylene and is equipped with a tamper proof ring.', 'PVC/PVDC blister pack', 'Blisters are made in a cold-forming process from an aluminium base web. Each tablet is\nfilled into a separate blister and a lidding foil of aluminium is welded on. The blisters\nare opened by pressing the tablets through the lidding foil.', '\n']
final_ref = [['Blister', 'Foil', 'Aluminium'], ['Blister', 'Base Web', 'PVC/PVDC'], ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], ['Bottle', 'Screw Type Cap', 'Polypropylene'], ['Blister', 'Base Web', 'PVC'], ['Blister', 'Base Web', 'PVD/PVDC'], ['Bottle', 'Square Shaped Bottle', 'Polyethylene']]
colours = ['White', 'Yellow', 'Blue', 'Red', 'Green', 'Black', 'Brown', 'Silver', 'Purple', 'Navy blue', 'Gray', 'Orange', 'Maroon', 'pink', 'colourless', 'blue']
TEXT_WITHOUT_COLOUR = 'Stage {counter} : Package Description: {sen} Values: {values}'
TEXT_WITH_COLOUR = TEXT_WITHOUT_COLOUR + ' Colour: {colour}'
counter = 1
result = []
def is_missing(words, sen):
for w in words:
if w.lower() not in sen.lower():
return True
return False
for words in final_ref:
for sen in paragraphs:
if is_missing(words, sen):
continue
kwargs = {
'counter': counter,
'sen': sen,
'values': str(words)
}
if words[0] == 'Bottle':
for wd in colours:
if wd.lower() in sen.lower():
kwargs['colour'] = wd
break
text_const = TEXT_WITH_COLOUR
else:
text_const = TEXT_WITHOUT_COLOUR
result.append(text_const.format(**kwargs).replace('\n', '').replace('\t', ''))
counter += 1
print(result)
Output:
["Stage 1 : Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister', 'Foil', 'Aluminium']",
"Stage 2 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'] Colour: White",
"Stage 3 : Package Description: The tablets are filled into cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'] Colour: White"]
Hope this helps.
Upvotes: 1