Brandon Hunt
Brandon Hunt

Reputation: 11

Break down a string with regex

I have some example strings I need to process

string1 = "_Wondrous item, common (requires attunement by a wizard or cleric)_"
string2 = "_Weapon (glaive), rare (requires attunement)_"
string3 = "_Wondrous item, common_"

I want to break them down into the following

group1 = {
  type: "Wonderous item"; 
  rarity: "common";
  attune: True
  class: "wizard or cleric"
  }
group2 = {
  type: "Weapon (glaive)";
  rarity: "rare";
  attune : True
  }
group3 = {
  type: "Wondrous item"
  rarity: "common"
  attune: False
  }

the regex that I have currently is messy and probably inefficient but it only breaks down the first one.

regex = /_(?<type>[^:]*),\s(?<rarity>[^:]*)\s\((?<attune>[^:]+)by a(?<class>[^:]*)\)_/U

added Details

Upvotes: 1

Views: 216

Answers (1)

The fourth bird
The fourth bird

Reputation: 163557

To get all groups for the 3 lines using your pattern:

_(?<type>[^:]*?),\s+(?<rarity>[^:]*?)(?:\s+\((?<attune>[^:]+?)\s*(?:by\s+a\s+(?<class>[^:]*?))?\))?_
  • _(?<type>[^:]*?) Match _, group type matches any char except : non greedy
  • ,\s Match , and a whitespace char
  • (?<rarity>[^:]*?) Group rarity matches any char except : non greedy
  • (?: Non capture group
    • \s\( Match a whitespace char and (
    • (?<attune>[^:]+?)\s* group attune matches any char except : non greedy
    • (?:by a\s+(?<class>[^:]*?))? Optionally match by a and group class which matches any char except : non greedy
    • \) Match )
  • )?_ Make the outer group optional and match _

See a regex demo.

Using the groups property if supported, you can check for the values and update the object accordingly.

const regex = /_(?<type>[^:]*?),\s+(?<rarity>[^:]*?)(?:\s+\((?<attune>[^:]+?)\s*(?:by\s+a\s+(?<class>[^:]*?))?\))?_/;
[
  "_Wondrous item, common (requires attunement by a wizard or cleric)_",
  "_Weapon (glaive), rare (requires attunement)_",
  "_Wondrous item, common_"

].forEach(s => {
  const m = s.match(regex);
  if (m) {
    if (m.groups.class === undefined) {
      delete m.groups.class;
    }
    m.groups.attune = m.groups.attune === undefined ? false : true;
    console.log(m.groups)
  }
});

Note that in your pattern you want to prevent matching : in the negated character class but there is no : in the example data.

For the fist negated character class you can change that to not match the comma, and for the others exclude matching the parenthesis to get the same result.

That way not all quantifiers have to be non greedy and it can prevent some unnecessary backtracking.

_(?<type>[^,]*),\s(?<rarity>[^:()]*)(?:\s\((?<attune>[^()]+?)\s*(?:by a\s+(?<class>[^()]*))?\))?_

See another regex demo.

Upvotes: 2

Related Questions