Reputation: 1134
I have two strings:
"TOP : Cotton + Embroidered ( 2 Mtr) \nBOTTOM : Cotton + Solid (2 Mtr) \nDUPATTA : Chiffon + Lace Work ( 2 Mtr) \nTYPE : Un Stitched\nCOLOUR : Multi Colour \nCONTAINS : 1 TOP WITH LINING 1 BOTTOM & 1 DUPATTA\nCountry of Origin: India"
and second one is:
"Top Fabric: Cotton Cambric + Top Length: 0-2.00\nBottom Fabric: Cotton Cambric + Bottom Length: 0-2.00\nDupatta Fabric: Nazneen + Dupatta Length: 0-2.00\nLining Fabric: Cotton Cambric\nType: Un Stitched\nPattern: Printed\nMultipack: 3 Top\nCountry of Origin: India"
I need to create a Python dictionary out of these two strings but with keys which are before colon
For example in string one keys would be
TOP,BOTTOM,DUPATTA,TYPE,COLOUR,CONTAINS,COUNTRY OF ORIGIN
and in second one
keys would be
Top Fabric,Bottom Fabric,Top Length,Bottom Length,Dupatta Fabric,Dupatta Length,Lining Fabric,Type,Pattern,Multipack,Country of Origin
So far I have used
keys = ["Top Fabric","Bottom Fabric","Dupatta Fabric","Lining Fabric","Type","Pattern","Multipack","TOP ","BOTTOM "," DUPATTA ","COLOUR ","CONTAINS ","TYPE ","Country"]
pattern = re.compile('({})\s+'.format(':|'.join(keys)))
newdict = dict(zip(*[(i.strip() for i in (pattern.split(desc.replace("*",""))) if i)]*2))
but it is not working on first string and on second string it is not creating every key and value.
Upvotes: 1
Views: 108
Reputation: 563
You may try below dict comprehension, s1 represents one of your strings:
d={i.split(':')[0].strip(): i.split(':')[1].strip() for i in s1.split('\n')}
Edited: To make combining dict easier you can define a function:
def f(s1):
return {i.split(':')[0].strip(): i.split(':')[1].strip() for i in s1.split('\n')}
f('\n'.join([s1,s2])) # single dict from both strings
set(f(s1).keys()).intersection(f(s2).keys()) # common keys
{'Country of Origin'} key common key in both sets, but it eeuals India
Upvotes: 1
Reputation: 163277
You might use a regex pattern that matches the part before the colon in group 1 and after the colon in group 2.
Then assert that after group 2, there is either another part starting with a +
followed by :
or the end of the string.
Then create a dictionary, stripping the group 1 and group 2 values.
(?:\s*\+\s*)?([^:]+)\s*:\s*([^:]+)(?=\+[^:+]*:|$)
The pattern matches:
(?:\s*\+\s*)?
Optionally match a +
sign between optional whitespace chars([^:]+)
Capture group 1, match any char except :
\s*:\s*
Match a :
between optional whitespace chars([^:]+)
Capture group 2, match any char except :
(?=\+[^:+]*:|$)
Positive lookahead, assert either +
followed by :
to the right, or assert the end of the stringExample
import re
import pprint
pattern = r"(?:\s*\+\s*)?([^:\r\n]+)\s*:\s*([^:\r\n]+)\s*(?=\+[^:+\n]*:|$)"
s = ("TOP : Cotton + Embroidered ( 2 Mtr) \n"
"BOTTOM : Cotton + Solid (2 Mtr) \n"
"DUPATTA : Chiffon + Lace Work ( 2 Mtr) \n"
"TYPE : Un Stitched\n"
"COLOUR : Multi Colour \n"
"CONTAINS : 1 TOP WITH LINING 1 BOTTOM & 1 DUPATTA\n"
"Country of Origin: India\n\n"
"Top Fabric: Cotton Cambric + Top Length: 0-2.00\n"
"Bottom Fabric: Cotton Cambric + Bottom Length: 0-2.00\n"
"Dupatta Fabric: Nazneen + Dupatta Length: 0-2.00\n"
"Lining Fabric: Cotton Cambric\n"
"Type: Un Stitched\n"
"Pattern: Printed\n"
"Multipack: 3 Top\n"
"Country of Origin: India")
dictionary = {}
for m in re.finditer(pattern, s, re.MULTILINE):
dictionary[m.group(1).strip()] = m.group(2).strip()
pprint.pprint(dictionary)
Output
{'BOTTOM': 'Cotton + Solid (2 Mtr)',
'Bottom Fabric': 'Cotton Cambric',
'Bottom Length': '0-2.00',
'COLOUR': 'Multi Colour',
'CONTAINS': '1 TOP WITH LINING 1 BOTTOM & 1 DUPATTA',
'Country of Origin': 'India',
'DUPATTA': 'Chiffon + Lace Work ( 2 Mtr)',
'Dupatta Fabric': 'Nazneen',
'Dupatta Length': '0-2.00',
'Lining Fabric': 'Cotton Cambric',
'Multipack': '3 Top',
'Pattern': 'Printed',
'TOP': 'Cotton + Embroidered ( 2 Mtr)',
'TYPE': 'Un Stitched',
'Top Fabric': 'Cotton Cambric',
'Top Length': '0-2.00',
'Type': 'Un Stitched'}
Upvotes: 1