Reputation: 157

Matching multiple patterns in a string

I have a string that looks like that:

s = "[A] text [B] more text [C] something ... [A] hello"

basically it consists of [X] chars and I am trying to get the text "after" every [X].

I would like to yield this dict (I don't care about order):

mydict = {"A":"text, hello", "B":"more text", "C":"something"}

I was thinking about a regex but I was not sure if that is the right choice because in my case the order of [A], [B] and [C] can change, so this string is valid too:

s = "[A] hello, [C] text [A] more text [B] something"

I don't know how to properly extract the string. Can anyone point me to the right direction? Thanks.

Upvotes: 1

Answers (3)

Jack Evans

Reputation: 1717

Not sure if this is quite what you're looking for but it fails with duplicates

s = "[A] hello, [C] text [A] more text [B] something"

results = [text.strip() for text in re.split('\[.\]', s) if text]

letters = re.findall('\[(.)\]', s)

dict(zip(letters, results))

{'A': 'more text', 'B': 'something', 'C': 'text'}

Since the output looks like this:

In [49]: results
Out[49]: ['hello,', 'text', 'more text', 'something']

In [50]: letters
Out[50]: ['A', 'C', 'A', 'B']

To solve for duplicate you could do something like....

mappings = {}

for pos, letter in enumerate(letters):
    try:
        mappings[letter] += ' ' + results[pos]
    except KeyError:
        mappings[letter] = results[pos]

which gives: {'A': 'hello, more text', 'B': 'something', 'C': 'text'}

UPDATE

Or even better you could look at using a default dict: as shown here: enter link description here

Upvotes: 4

ZnArK

Reputation: 1539

Here's a simple solution:

#!/usr/bin/python

import re
s = "[A] text [B] more text [C] something ... [A] hello"
d = dict()
for x in re.findall(r"\[[^\]+]\][^\[]*",s):
    m = re.match(r"\[([^\]*])\](.*)",x)

    if not d.get(m.group(1),0):
        #Key doesn't already exist
        d[m.group(1)] = m.group(2)
    else:
        d[m.group(1)] = "%s, %s" % (d[m.group(1)], m.group(2))

print d

Prints:

{'A': ' text ,  hello', 'C': ' something ... ', 'B': ' more text '}

Upvotes: 0

double_j

Reputation: 1716

Expected output: mydict = {"A":"text, hello", "B":"more text", "C":"something"}

import re

s = "[A] text [B] more text [C] something ... [A] hello"

pattern = r'\[([A-Z])\]([ a-z]+)'

items = re.findall(pattern, s)

output_dict = {}

for x in items:
    if x[0] in output_dict:
        output_dict[x[0]] = output_dict[x[0]] + ', ' + x[1].strip()
    else:
        output_dict[x[0]] = x[1].strip()

print(output_dict)

>>> {'A': 'text, hello', 'B': 'more text', 'C': 'something'}

Upvotes: 1

Matching multiple patterns in a string

Answers (3)

Related Questions