How to properly split string to create dictionary in Python?

Question

I have two strings:

"TOP : Cotton + Embroidered ( 2 Mtr) BOTTOM : Cotton + Solid (2 Mtr) DUPATTA : Chiffon + Lace Work ( 2 Mtr) TYPE : Un Stitched COLOUR : Multi Colour CONTAINS : 1 TOP WITH LINING 1 BOTTOM & 1 DUPATTA Country of Origin: India"

and second one is:

"Top Fabric: Cotton Cambric + Top Length: 0-2.00 Bottom Fabric: Cotton Cambric + Bottom Length: 0-2.00 Dupatta Fabric: Nazneen + Dupatta Length: 0-2.00 Lining Fabric: Cotton Cambric Type: Un Stitched Pattern: Printed Multipack: 3 Top Country of Origin: India"

I need to create a Python dictionary out of these two strings but with keys which are before colon

For example in string one keys would be

TOP,BOTTOM,DUPATTA,TYPE,COLOUR,CONTAINS,COUNTRY OF ORIGIN

and in second one

keys would be

Top Fabric,Bottom Fabric,Top Length,Bottom Length,Dupatta Fabric,Dupatta Length,Lining Fabric,Type,Pattern,Multipack,Country of Origin

So far I have used

keys = ["Top Fabric","Bottom Fabric","Dupatta Fabric","Lining Fabric","Type","Pattern","Multipack","TOP ","BOTTOM ","  DUPATTA ","COLOUR ","CONTAINS ","TYPE ","Country"] 

pattern = re.compile('({})\s+'.format(':|'.join(keys))) 
newdict = dict(zip(*[(i.strip() for i in (pattern.split(desc.replace("*",""))) if i)]*2))

but it is not working on first string and on second string it is not creating every key and value.

The fourth bird · Accepted Answer

You might use a regex pattern that matches the part before the colon in group 1 and after the colon in group 2.

Then assert that after group 2, there is either another part starting with a + followed by : or the end of the string.

Then create a dictionary, stripping the group 1 and group 2 values.

(?:\s*\+\s*)?([^:]+)\s*:\s*([^:]+)(?=\+[^:+]*:|$)

The pattern matches:

(?:\s*\+\s*)? Optionally match a + sign between optional whitespace chars
([^:]+) Capture group 1, match any char except :
\s*:\s* Match a : between optional whitespace chars
([^:]+) Capture group 2, match any char except :
(?=\+[^:+]*:|$) Positive lookahead, assert either + followed by : to the right, or assert the end of the string

Regex demo | Python demo

Example

import re
import pprint

pattern = r"(?:\s*\+\s*)?([^:
]+)\s*:\s*([^:
]+)\s*(?=\+[^:+
]*:|$)"

s = ("TOP : Cotton + Embroidered ( 2 Mtr) 
"
            "BOTTOM : Cotton + Solid (2 Mtr) 
"
            "DUPATTA : Chiffon + Lace Work ( 2 Mtr) 
"
            "TYPE : Un Stitched
"
            "COLOUR : Multi Colour 
"
            "CONTAINS : 1 TOP WITH LINING 1 BOTTOM & 1 DUPATTA
"
            "Country of Origin: India

"
            "Top Fabric: Cotton Cambric + Top Length: 0-2.00
"
            "Bottom Fabric: Cotton Cambric + Bottom Length: 0-2.00
"
            "Dupatta Fabric: Nazneen + Dupatta Length: 0-2.00
"
            "Lining Fabric: Cotton Cambric
"
            "Type: Un Stitched
"
            "Pattern: Printed
"
            "Multipack: 3 Top
"
            "Country of Origin: India")

dictionary = {}
for m in re.finditer(pattern, s, re.MULTILINE):
    dictionary[m.group(1).strip()] = m.group(2).strip()
pprint.pprint(dictionary)

Output

{'BOTTOM': 'Cotton + Solid (2 Mtr)',
 'Bottom Fabric': 'Cotton Cambric',
 'Bottom Length': '0-2.00',
 'COLOUR': 'Multi Colour',
 'CONTAINS': '1 TOP WITH LINING 1 BOTTOM & 1 DUPATTA',
 'Country of Origin': 'India',
 'DUPATTA': 'Chiffon + Lace Work ( 2 Mtr)',
 'Dupatta Fabric': 'Nazneen',
 'Dupatta Length': '0-2.00',
 'Lining Fabric': 'Cotton Cambric',
 'Multipack': '3 Top',
 'Pattern': 'Printed',
 'TOP': 'Cotton + Embroidered ( 2 Mtr)',
 'TYPE': 'Un Stitched',
 'Top Fabric': 'Cotton Cambric',
 'Top Length': '0-2.00',
 'Type': 'Un Stitched'}

How to properly split string to create dictionary in Python?

Answers (2)

Related Questions