Parse poker game description (generated by multiple different converters)

Question

For a hobby project I am trying to write some poker application. Part of it's functionality is the ability to parse messges from poker forums with games descriptions. Here are plain text version of message examples:

EXAMPLE 1

    $0.02/$0.05 No-Limit Hold'em (8 handed) 

Known players:
 BB: $1.70   UTG2: $13.05   MP1: $2.89   MP2: $2.64   MP3 (Hero): $5.28   CO
: $5.00   BU: $5.00   SB: $11.37  

Preflop: Hero is MP3 with 8 
[http://resources.pokerstrategy.com/smileys/heart.png], 8 
[http://resources.pokerstrategy.com/smileys/club.png].
UTG2 folds, MP1 raises to $0.15, MP2 calls $0.15, Hero calls $0.15, CO folds, 
BU calls $0.15,  2 folds, BB folds.

Flop: ($0.67) 8 [http://resources.pokerstrategy.com/smileys/diamond.png], K 
[http://resources.pokerstrategy.com/smileys/club.png], 6 
[http://resources.pokerstrategy.com/smileys/diamond.png] (4 players)
MP1 checks, MP2 checks, Hero bets $0.47, BU folds, MP1 folds, MP2 calls $0.47.

Turn: ($1.61) A [http://resources.pokerstrategy.com/smileys/club.png] (2 
players)
MP2 checks, Hero checks.

River: ($1.61) Q [http://resources.pokerstrategy.com/smileys/club.png] (2 
players)
MP2 bets $0.60, Hero raises to $2.10, MP2 calls $1.42.

Final Pot: $5.73.

EXAMPLE 2

Grabbed by Holdem Manager 
 NL Holdem $0.05(BB) Replayer 
 SB ($5.02)
 BB ($4.78)
 UTG ($2)
 UTG+1 ($2)
 UTG+2 ($1.88)
 MP1 ($5.32)
CO ($10.36) (21/18 на 109 рук, С-бет Ф=88%(11), С-бет Т=33%(3), АФ=2,6 
(4,5/3,0/0,5), WTSD=41%, W$SD=71%)
 Hero ($10.98)

Dealt to Hero 9 [http://resources.pokerstrategy.com/smileys/spade.png] T 
[http://resources.pokerstrategy.com/smileys/spade.png] 

 UTG calls $0.05, fold, fold, fold, CO raises to $0.20, Hero calls $0.20, 
fold, fold, fold

 FLOP ($0.52) J [http://resources.pokerstrategy.com/smileys/club.png] 8 
[http://resources.pokerstrategy.com/smileys/spade.png] 4 
[http://resources.pokerstrategy.com/smileys/diamond.png] 

CO bets $0.35, Hero calls $0.35

 TURN ($1.22) J [http://resources.pokerstrategy.com/smileys/club.png] 8 
[http://resources.pokerstrategy.com/smileys/spade.png] 4 
[http://resources.pokerstrategy.com/smileys/diamond.png] 9 
[http://resources.pokerstrategy.com/smileys/heart.png] 

CO checks, Hero checks

 RIVER ($1.22) J [http://resources.pokerstrategy.com/smileys/club.png] 8 
[http://resources.pokerstrategy.com/smileys/spade.png] 4 
[http://resources.pokerstrategy.com/smileys/diamond.png] 9 
[http://resources.pokerstrategy.com/smileys/heart.png] 4 
[http://resources.pokerstrategy.com/smileys/spade.png] 

CO bets $1, Hero ???

Basically these two examples were generated by two different converters. And there are currently about 20 different converters out there.

What I need to do is to be able to parse these game descriptions for different converters and "translate" text description of a game into java object Game. I've already written some code with a lot of regexp. This code can parse approximately 70% of my tests correctly, but:

it's really hard to maintain
I want to teach myself something new and cool.

So, what are my other options besides regexp? I am currently looking into ANTLR, but I am not sure if it is a best choice for this task.

Vala · Accepted Answer

ANTLR would definitely be a good fit for your requirements. Using regular expression for language processing is very brittle and any changes from one version to another is significantly more likely to break your interpreter than if you use ANTLR or similar tools. What you may be able to do is write a single Lexer, and a base Parser, that can then be extended by more specific parsers for specific differences between the converters.

Once you have made a Lexer and know what you're doing with the parsers, it's A LOT quicker to change things in ANTLR than in a roll-your-own solution! I do agree that ANTLR docs aren't the best, but there are a ton of good tutorials out there for ANTLR 3 from third parties (and you'll get good help here on SO for any specific problems).

My personal preference is to make fairly simple Lexer/Parsers that output AST trees, then manually coding a tree-walker that walks the nodes as they're provided by the Parser. Some will argue for making the tree-walker in ANTLR as well, but I found this more difficult and time consuming than it was worth (as it was largely not re-usable anyway).

It may take a little while getting used to the mind-set of creating a good grammar file, but it's very satisfying once you've done it and you see how much better it is the first time you need to modify or extend something. ;)

Parse poker game description (generated by multiple different converters)

Answers (1)

Related Questions