Angezerus
Angezerus

Reputation: 45

Regex for splitting txt db file

I'm writing a text database file to SQL converter, and I need to split the items in the rows. The problem is that among the items there might be scripts which can hold multiple commas (which are the separators in the db structure). The good news is that the scripts are nested into {}-s so, it makes the job similar to parsing a CSV file. The only problem is that the scripts themselves can hold more scripts nested into {}-s, and this stops my formula from working.

Structure of the txt db:

501,Red_Potion,Red Potion,0,50,,70,,,,,0xFFFFFFFF,7,2,,,,,,{ itemheal rand(45,65),0; },{},{}
502,Orange_Potion,Orange Potion,0,200,,100,,,,,0xFFFFFFFF,7,2,,,,,,{ itemheal rand(105,145),0; },{},{}
503,Yellow_Potion,Yellow Potion,0,550,,130,,,,,0xFFFFFFFF,7,2,,,,,,{ itemheal rand(175,235),0; },{},{}
504,White_Potion,White Potion,0,1200,,150,,,,,0xFFFFFFFF,7,2,,,,,,{ itemheal rand(325,405),0; },{},{}

The regex that I use to match the delimiters for splitting:

,(?![^{}]*\})

This works fine until it counters a more complicated nested script item, like:

1492,Velum_Glaive,Vellum Glaive,4,20,,4500,250,,3,0,0x00004082,7,2,34,4,95,1,5,{ bonus2 bAddRace,RC_DemiHuman,80; if(getrefine()>=6) { bonus2 bSkillAtk,"LK_SPIRALPIERCE",100; bonus2 bSkillAtk,"KN_SPEARBOOMERANG",50; } if(getrefine()>=9) { autobonus2 "{ bonus bShortWeaponDamageReturn,20; bonus bMagicDamageReturn,20; }",100,2000,BF_WEAPON|BF_MAGIC,"{ specialeffect2 EF_REFLECTSHIELD; }"; } },{},{}

So how do I make it match only the db structure delimiters and leave the commas in the script out?

Upvotes: 1

Views: 73

Answers (1)

Martin Ender
Martin Ender

Reputation: 44279

This is not a job for regex. As I pointed out in my comment, nested structures are usually beyond what regex can do. PCRE has the recursion construct (?R) and .NET has balanced groups, but the solution usually get really unreadable and unmaintainable.

Added to that, you don't just have to take {} into account, but strings and comments in your script as well. You are much better off, parsing the thing manually. Here is a quick and dirty example how it would be done using PHP (ignoring strings and comments!):

$level = 0;
$values = array();
$start = 0;
for($i = 0; $i < strlen($str); $i++)
{
    switch($str[$i])
    {
    case ",":
        if(!$level) {
            $values[] = substr($str, $start, $i-$start);
            $start = $i+1;
        }
        break;
    case "{":
        $level++;
        break;
    case "}":
        $level--;
        if($level < 0) trigger_error("unexpected }");
        break;
    }
}
if($level > 0) trigger_error("missing }");
$values[] = substr($str, $start);

You can see already, that you basically end up with a simpler parser for your script.

Upvotes: 1

Related Questions