Reputation: 52498
I have below a list of text, it is from a popular online game called EVE Online and this basically gets mailed to you when you kill a person in-game. I'm building a tool to parse these using PHP to extract all relevant information. I will need all pieces of information shown and i'm writting classes to nicely break it into relevant encapsulated data.
2008.06.19 20:53:00
Victim: Massi
Corp: Cygnus Alpha Syndicate
Alliance: NONE
Faction: NONE
Destroyed: Raven
System: Jan
Security: 0.4
Damage Taken: 48436
Involved parties:
Name: Kale Kold
Security: -10.0
Corp: Vicious Little Killers
Alliance: NONE
Faction: NONE
Ship: Drake
Weapon: Hobgoblin II
Damage Done: 22093
Name: Harulth (laid the final blow)
Security: -10.0
Corp: Vicious Little Killers
Alliance: NONE
Faction: NONE
Ship: Drake
Weapon: Caldari Navy Scourge Heavy Missile
Damage Done: 16687
Name: Gistatis Tribuni / Angel Cartel
Damage Done: 9656
Destroyed items:
Capacitor Power Relay II, Qty: 2
Paradise Cruise Missile, Qty: 23
Cataclysm Cruise Missile, Qty: 12
Small Tractor Beam I
Alloyed Tritanium Bar, Qty: 2 (Cargo)
Paradise Cruise Missile, Qty: 1874 (Cargo)
Contaminated Nanite Compound (Cargo)
Capacitor Control Circuit I, Qty: 3
Ballistic Deflection Field I
'Malkuth' Cruise Launcher I, Qty: 3
Angel Electrum Tag, Qty: 2 (Cargo)
Dropped items:
Ballistic Control System I
Shield Boost Amplifier I, Qty: 2
Charred Micro Circuit, Qty: 4 (Cargo)
Capacitor Power Relay II, Qty: 2
Paradise Cruise Missile, Qty: 10
Cataclysm Cruise Missile, Qty: 21
X-Large Shield Booster II
Cataclysm Cruise Missile, Qty: 3220 (Cargo)
Fried Interface Circuit (Cargo)
F-S15 Braced Deflection Shield Matrix, Qty: 2
Salvager I
'Arbalest' Cruise Launcher I
'Malkuth' Cruise Launcher I, Qty: 2
I'm thinking about using regular expressions to parse the data but how would you approach this? Would you collapse the mail into a one line string or parse each line from an array? The trouble is there are a few anomalies to account for.
First, the 'Involved parties:' section is dynamic and can contain lots of people all with the similar structure as below but if a computer controlled enemy takes a shot at the victim too, it gets shortened to only the 'Name' and 'Damage Done' fields, as shown above (Gistatis Tribuni / Angel Cartel).
Second, the 'Destroyed' and 'Dropped' items are dynamic and will be different lengths on each mail and i will also need to get the quantity and wether or not they are in cargo.
Ideas for an approach are welcome.
Upvotes: 2
Views: 1595
Reputation: 300865
I'd probably go with a state machine approach, reading each line in sequence and dealing with it depending on the current state.
Some lines, like "Dropped items:" change the state, causing you to interpret following lines as items. While in the "reading involved parties" state you'd be adding each line to an array of data about the person, and when you read a blank line, you know you have a complete record.
Here's a rough FSM I knocked up in GraphViz
Some edges will trigger actions in your code, like reading blank lines.
Upvotes: 12
Reputation: 596813
If you want something flexible, use the state machine approach.
If you want something quick and dirty, use regexp.
For the first solution, you can use libraries that are specialized in parsin since it's not a trivial task. But because it's fairly simple format, you can hack a naive parser, as for example :
<?php
class Parser
{
/* Enclosing the parser in a class is not mandatory but it' clean */
function Parser()
{
/* data holder */
$this->date = '';
$this->parties = array();
$this->victim = array();
$this->items = array("Destroyed" => array(),
"Dropped" => array());
/* Map you states on actions. Sub states can be necessary (and sub parsers too :-) */
$this->states = array('Victim' => 'victim_parsing',
'Involved' => 'parties_parsing' ,
'items:' => "item_parsing");
$this->state = 'start';
$this->item_parsing_state = 'Destroyed';
$this->partie_parsing_state = '';
$this->parse_tools = array('start' => 'start_parsing',
'parties_parsing' =>'parties_parsing',
'item_parsing' => 'item_parsing',
'victim_parsing' => 'victim_parsing');
}
/* the magic job is done here */
function checkLine($line)
{
foreach ($this->states as $keyword => $state)
if (strpos($line, $keyword) !== False)
$this->state = $this->states[$keyword];
return trim($line);
}
function parse($file)
{
$this->file = new SplFileObject($file);
foreach ($this->file as $line)
if ($line = $this->checkLine($line))
$this->{$this->parse_tools[$this->state]}($line);
}
/* then here you can define as much as parsing rules as you want */
function victim_parsing($line)
{
$victim_caract = explode(': ', $line);
$this->victim[$victim_caract[0]] = $victim_caract[1];
}
function start_parsing($line)
{
$this->date = $line;
}
function item_parsing($line)
{
if (strpos($line, 'items:') !== False)
{
$item_state = explode(' ', $line);
$this->item_parsing_state = $item_state[0];
}
else
{
$item_caract = explode(', Qty: ', $line);
$this->items[$this->item_parsing_state][$item_caract[0]] = array();
$item_infos = explode(' ', $item_caract[1]);
$this->items[$this->item_parsing_state][$item_caract[0]] ['qty'] = empty($item_infos[0]) ? 1 : $item_infos[0];
$this->items[$this->item_parsing_state][$item_caract[0]] ['cargo'] = !empty( $item_infos[1]) ? "True": "False";
if (empty( $this->items[$this->item_parsing_state][$item_caract[0]] ['qty'] ))
print $line;
}
}
function parties_parsing($line)
{
$partie_caract = explode(': ', $line);
if ($partie_caract[0] == "Name")
{
$this->partie_parsing_state = $partie_caract[1];
$this->parties[ $this->partie_parsing_state ] = array();
}
else
$this->parties[ $this->partie_parsing_state ][$partie_caract[0]] = $partie_caract[1];
}
}
/* a little test */
$parser = new Parser();
$parser->parse('test.txt');
echo "======== Fight report - ".$parser->date." ==========\n\n";
echo "Victim :\n\n";
print_r($parser->victim);
echo "Parties :\n\n";
print_r($parser->parties);
echo "Items: \n\n";
print_r($parser->items);
?>
We can do that because here, reliability and perf are not an issue :-)
Happy game !
Upvotes: 4
Reputation: 96159
You might be interested in http://pear.php.net/package/PHP_LexerGenerator
(Yes, it's alpha. Yes, I haven't used it myself. Yes, you need to know/learn the lexer syntax. Why do I suggest it? Just curious what your experience with it would be ;-))
Upvotes: 1