Adrian Smith
Adrian Smith

Reputation: 19

Parsing XML snippets contained in a log file

I have a data file from an external source which contains multiple XML snippets (up to 350). Each one starts with a bit of text and then some XML output like this :

2025-02-21 16:45:55,760 - Transaction RUN04-merchtranid1 - Success: <?xml version="1.0" encoding="UTF-8"?>
<payment_response>
  <transaction_type>sale</transaction_type>
  <status>approved</status>
  <recurring_type>initial</recurring_type>
  <unique_id>ccc01a6fb43b45cf38298fee067d1688</unique_id>
  <transaction_id>RUN04-merchtranid1</transaction_id>
  <mode>test</mode>
  <timestamp>2025-02-21T14:45:55Z</timestamp>
  <descriptor>UAT Gen Current UK</descriptor>
  <amount>0</amount>
  <currency>EUR</currency>
  <sent_to_acquirer>true</sent_to_acquirer>
  <scheme_transaction_identifier>485029514074150</scheme_transaction_identifier>
</payment_response>

2025-02-21 16:45:56,704 - Transaction RUN04-merchtranid2 - Success: <?xml version="1.0" encoding="UTF-8"?>
<payment_response>
  <transaction_type>sale</transaction_type>
  <status>approved</status>
  <recurring_type>initial</recurring_type>
  <unique_id>1f293c3166045f645b9ea4aeee755840</unique_id>
  <transaction_id>RUN04-merchtranid2</transaction_id>
  <mode>test</mode>
  <timestamp>2025-02-21T14:45:56Z</timestamp>
  <descriptor>UAT Gen Current UK</descriptor>
  <amount>0</amount>
  <currency>GBP</currency>
  <sent_to_acquirer>true</sent_to_acquirer>
  <scheme_transaction_identifier>MDHMKJSTW</scheme_transaction_identifier>
  <scheme_settlement_date>0129</scheme_settlement_date>
</payment_response>

I need to roll through it with PHP and pick out the <unique_id> and matching <transaction_id> for each transaction. I can do this if it was a properly formatted XML document but this obviously isn't.

Would really appreciate any ideas ?

Upvotes: 1

Views: 42

Answers (1)

Maciej Łebkowski
Maciej Łebkowski

Reputation: 3942

Given your input is in $input variable, I made a simple script to parse it line-by-line. If it starts with a timestamp, start collecting a new document. Otherwise append to the current one.

$documents = [];
foreach (explode("\n", $input) as $line) {
    preg_match('/(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) - (?P<message>[^<]+) (?P<xml>.*?)$/s', $line, $m);
    if ($m) {
        $documents[] = $m['xml'];
    } else {
        $documents[count($documents)-1] .= "\n".$line;
    }
}
    
    
$transactions=[];
foreach ($documents as $document) {
    $xml = simplexml_load_string($document);
    $transactions[(string) $xml->unique_id] = (string) $xml->transaction_id;
}
var_dump($transactions);

This might require some tweaks if your input contains more distinct patterns.

Upvotes: 0

Related Questions