vlad
vlad

Reputation: 855

Use groupdict to parse string to dict

I need to process the text to create a dictionary {name: quantity}

Variants of text:

2 Cardname
3 Cardname Two
1 Cardname Three

Cardname
Cardname Two
Cardname Three

So i wrote a basic code:

card_list = card_area.splitlines()
card_dict = {}

for card in card_list:
    qty_re = re.search('^\d{1,6}', card)
        if qty_re:
            qty = qty_re.group()
        else:
            qty = 1

     name_re = re.search('[A-Za-z ]+$', card)
        if name_re:
            name = name_re.group()
        else:
            name = None

     if name:
         card_dict[name] = qty

The first question: Can I use the groupdict method if some elements of strings isn't exists (no qty or empty string).

Second: I also want to consider such formats:

2 x Cardname
3x Cardname Two
1 xCardname Three
1xCardname Four

What is the best way ?

Upvotes: 2

Views: 5433

Answers (2)

steveha
steveha

Reputation: 76735

A solution. Notes to follow.

from collections import defaultdict
import re

# card_list = card_area.splitlines()
card_list = [
    "2 Cardname", "3 Cardname Two", "1 Cardname Three",
    "Cardname", "Cardname Two", "Cardname Three",
    "1x Cardname", "4X Cardname Two", "2 X Cardname Three",
]

card_dict = defaultdict(int)

pat = re.compile(r'(\d*)\s*(?:[xX]\s+)?(\S.*)')

for card in card_list:
    m = re.search(pat, card)
    if not m:
        continue
    if m.group(1):
        qty = int(m.group(1))
    else:
        qty = 1

    name = m.group(2)
    card_dict[name] += qty


if not card_dict:
    print("empty card_dict!")
else:
    for name in sorted(card_dict):
        print("%20s|%4d" % (name, card_dict[name]))

Notes:

  • I recommend pre-compiling the regular expression pattern, for speed.

  • The best way to handle this is a single regular expression pattern that grabs both the count and the card. I have added an optional pattern that recognizes card formats with the optional 'x'; using a character class I made it match either upper- or lower-case 'x'. The white space between the number and the 'x' is optional but there must be white space between the 'x' and the card name, or else the 'x' will be treated as part of the card name.

  • If you are not familiar with regular expressions, here is how to read this one: form a match group that matches zero or more digits. This is followed by zero or more white space characters. This is followed by another group, but this following group is flagged with (?: rather than just ( so it is a group but will not make a match group in the output; this group is a character class matching 'x' or 'X' followed by one or more white space characters. Form another match group, which starts with one non-whitespace character and is followed by zero or more of any character.

  • I believe you want to sum all the cards of the same name? The best for that is to use defaultdict() as I showed here.

  • If no legal card name ever starts with 'x' or 'X', you could change the pattern to not keep the 'x' even when there is no space between it and the card name. To do that, change the pattern to match the 'x' from this: (?:[xX]\s+)? to this: (?:[xX]\s*)? (Note that a single + changed to a single * after the \s, so zero whitespace characters will now be accepted.)

Upvotes: 1

Gareth Latty
Gareth Latty

Reputation: 89087

You can do this with a single regex:

import re

regex = re.compile(r'(\d*)([A-Za-z ]+)$')
card_list = ["2 Cardname", "3 Cardname Two", "Cardname Three"]
card_dict = {}

for quantity, name in (regex.match(card).groups() for card in card_list):
    if not quantity:
        quantity = 1
    card_dict[name.strip()] = int(quantity)

print(card_dict)

Giving us:

{'Cardname Two': 3, 'Cardname Three': 1, 'Cardname': 2}

You can't use groupdict() to achieve what you want as it returns a dict of subgroup_name: match not match: match. Instead we do a match, then get the groups, which gives us a tuple with our matches in.

Supporting the notation with an extra x in is very easy, we just add it into the regex:

regex = re.compile(r'(\d*)x?([A-Za-z ]+)$')

By matching x? we match the x if it is there, and don't if it is not. The only potential issue here is if you have a card name that begins with an x.

Note that if you can assume that the number will always be there, you can do this as a one-liner:

{name.strip(): quantity for quantity, name in (regex.match(card).groups() for card in card_list)}

Although I would argue this is pushing the bounds of readability.

Upvotes: 1

Related Questions