mmarboeuf
mmarboeuf

Reputation: 47

How to parse a string in Python

How to parse string composed of n parameter and randomly sorted such as:

{ UserID : 36875;  tabName : QuickAndEasy}
{ RecipeID : 1150;  UserID : 36716}
{ isFromLabel : 0;  UserID : 36716;  type : recipe;  searchWord : soup}
{ UserID : 36716;  tabName : QuickAndEasy}

Ultimately I'm looking to ouput parameters in separate columns for a table.

Upvotes: 0

Views: 112

Answers (2)

Scooter
Scooter

Reputation: 7061

lines =  "{ UserID : 36875;  tabName : QuickAndEasy } ",  \
         "{ RecipeID : 1150;  UserID : 36716}",  \
         "{ isFromLabel : 0;  UserID : 36716;  type : recipe;  searchWord : soup}" , \
         "{ UserID : 36716;  tabName : QuickAndEasy}"

counter = 0

mappedLines = {}

for line in lines:
    counter = counter + 1
    lineDict = {}
    line = line.replace("{","")
    line = line.replace("}","")
    line = line.strip()
    fieldPairs = line.split(";")

    for pair in fieldPairs:
        fields = pair.split(":")
        key = fields[0].strip()
        value = fields[1].strip()
        lineDict[key] = value

    mappedLines[counter] = lineDict

def printField(key, lineSets, comma_desired = True):
    if key in lineSets:
        print(lineSets[key],end="")
    if comma_desired:
        print(",",end="")
    else:
        print()

for key in range(1,len(mappedLines) + 1):
    lineSets = mappedLines[key]
    printField("UserID",lineSets)
    printField("tabName",lineSets)
    printField("RecipeID",lineSets)
    printField("type",lineSets)
    printField("searchWord",lineSets)
    printField("isFromLabel",lineSets,False)

CSV output:

36875,QuickAndEasy,,,,
36716,,1150,,,
36716,,,recipe,soup,0
36716,QuickAndEasy,,,,

The code above was Python 3.4. You can get similar output with 2.7 by replacing the function and the last for loop with this:

def printFields(keys, lineSets):
    output_line = ""
    for key in keys:
        if key in lineSets:
            output_line = output_line + lineSets[key] + ","
        else:
            output_line += ","
    print output_line[0:len(output_line) - 1]

fields = ["UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"]

for key in range(1,len(mappedLines) + 1):
    lineSets = mappedLines[key]
    printFields(fields,lineSets)

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336158

The regex ([^{}\s:]+)\s*:\s*([^{}\s;]+) works on your examples. You need to be aware, though, that all the matches will be strings, so if you want to store 36875 as a number, you'll need to do some additional processing.

import re
regex = re.compile(
    r"""(        # Match and capture in group 1:
     [^{}\s:]+   # One or more characters except braces, whitespace or :
    )            # End of group 1
    \s*:\s*      # Match a colon, optionally surrounded by whitespace
    (            # Match and capture in group 2:
     [^{}\s;]+   # One or more characters except braces, whitespace or ;
    )            # End of group 2""", 
    re.VERBOSE)

You can then do

>>> dict(regex.findall("{ isFromLabel : 0;  UserID : 36716;  type : recipe;  searchWord : soup}"))
{'UserID': '36716', 'isFromLabel': '0', 'searchWord': 'soup', 'type': 'recipe'}

Test it live on regex101.com.

Upvotes: 1

Related Questions