Use groupdict to parse string to dict

Question

I need to process the text to create a dictionary {name: quantity}

Variants of text:

2 Cardname
3 Cardname Two
1 Cardname Three

Cardname
Cardname Two
Cardname Three

So i wrote a basic code:

card_list = card_area.splitlines()
card_dict = {}

for card in card_list:
    qty_re = re.search('^\d{1,6}', card)
        if qty_re:
            qty = qty_re.group()
        else:
            qty = 1

     name_re = re.search('[A-Za-z ]+$', card)
        if name_re:
            name = name_re.group()
        else:
            name = None

     if name:
         card_dict[name] = qty

The first question: Can I use the groupdict method if some elements of strings isn't exists (no qty or empty string).

Second: I also want to consider such formats:

2 x Cardname
3x Cardname Two
1 xCardname Three
1xCardname Four

What is the best way ?

steveha · Accepted Answer

A solution. Notes to follow.

from collections import defaultdict
import re

# card_list = card_area.splitlines()
card_list = [
    "2 Cardname", "3 Cardname Two", "1 Cardname Three",
    "Cardname", "Cardname Two", "Cardname Three",
    "1x Cardname", "4X Cardname Two", "2 X Cardname Three",
]

card_dict = defaultdict(int)

pat = re.compile(r'(\d*)\s*(?:[xX]\s+)?(\S.*)')

for card in card_list:
    m = re.search(pat, card)
    if not m:
        continue
    if m.group(1):
        qty = int(m.group(1))
    else:
        qty = 1

    name = m.group(2)
    card_dict[name] += qty


if not card_dict:
    print("empty card_dict!")
else:
    for name in sorted(card_dict):
        print("%20s|%4d" % (name, card_dict[name]))

Notes:

I recommend pre-compiling the regular expression pattern, for speed.
The best way to handle this is a single regular expression pattern that grabs both the count and the card. I have added an optional pattern that recognizes card formats with the optional 'x'; using a character class I made it match either upper- or lower-case 'x'. The white space between the number and the 'x' is optional but there must be white space between the 'x' and the card name, or else the 'x' will be treated as part of the card name.
If you are not familiar with regular expressions, here is how to read this one: form a match group that matches zero or more digits. This is followed by zero or more white space characters. This is followed by another group, but this following group is flagged with (?: rather than just ( so it is a group but will not make a match group in the output; this group is a character class matching 'x' or 'X' followed by one or more white space characters. Form another match group, which starts with one non-whitespace character and is followed by zero or more of any character.
I believe you want to sum all the cards of the same name? The best for that is to use defaultdict() as I showed here.
If no legal card name ever starts with 'x' or 'X', you could change the pattern to not keep the 'x' even when there is no space between it and the card name. To do that, change the pattern to match the 'x' from this: (?:[xX]\s+)? to this: (?:[xX]\s*)? (Note that a single + changed to a single * after the \s, so zero whitespace characters will now be accepted.)

Use groupdict to parse string to dict

Answers (2)

Related Questions