WirelessFan
WirelessFan

Reputation: 1

Use XML parser for non-xml file?

I have a bunch of LTE CDR's that when decoded look and feel just like XML, but aren't (I'm not sure the exact differences, but it's hierarchical, similar to XML). I've copied one of the lines below. There are 50 or 60 entries just like this in each file.

My goal is to search for an entry that matches and IP address (below in HEX) and time range. Then to correlate the IMSI to it. These field are below.

Fields I'm searching for:

...
<servedIMSI>13 91 03 00 00 00 10 F8</servedIMSI>
...
<servedPDPAddress>
        <iPAddress>
            <iPBinaryAddress>
                <iPBinV4Address>0A 37 00 11</iPBinV4Address>
            </iPBinaryAddress>
        </iPAddress>
    </servedPDPAddress>
...
<timeOfFirstUsage>14 02 04 04 09 40 2D 06 00</timeOfFirstUsage>
<timeOfLastUsage>14 02 04 04 12 44 2D 06 00</timeOfLastUsage>
...

I've tried to use XML tools, but since this is not XML, they don't work.

I was wondering if there is a better way to search and retrieve the data I want. I can use regex expressions to find the data, but the XML approach seems to be a better approach (even though this isn't XML). I'm open to any and all ideas!

Snippet of CDR:

<GPRSRecord>
    <egsnPDPRecord>
        <recordType>70</recordType>
        <servedIMSI>13 91 03 00 00 00 10 F8</servedIMSI>
        <ggsnAddress>
            <iPBinaryAddress>
                <iPBinV4Address>AB CD 72 62</iPBinV4Address>
            </iPBinaryAddress>
        </ggsnAddress>
        <chargingID>126400647</chargingID>
        <sgsnAddress>
                <iPBinaryAddress>
                    <iPBinV4Address>AB CD 72 62</iPBinV4Address>
                </iPBinaryAddress>

        </sgsnAddress>
        <accessPointNameNI><bs/>Internet<si/>syringawireless<etx/>com</accessPointNameNI>
        <pdpType>01 21</pdpType>
        <servedPDPAddress>
            <iPAddress>
                <iPBinaryAddress>
                    <iPBinV4Address>0A 37 00 11</iPBinV4Address>
                </iPBinaryAddress>
            </iPAddress>
        </servedPDPAddress>
        <dynamicAddressFlag><true/></dynamicAddressFlag>
        <listOfTrafficVolumes>
            <ChangeOfCharCondition>
                <dataVolumeGPRSUplink>192323</dataVolumeGPRSUplink>
                <dataVolumeGPRSDownlink>320043</dataVolumeGPRSDownlink>
                <changeCondition><recordClosure/></changeCondition>
                <changeTime>14 02 04 04 12 46 2D 06 00</changeTime>
                <userLocationInformation>01 13 01 39 01 86 BD 01</userLocationInformation>
            </ChangeOfCharCondition>
        </listOfTrafficVolumes>
        <recordOpeningTime>14 02 04 04 09 40 2D 06 00</recordOpeningTime>
        <duration>186</duration>
        <causeForRecClosing>16</causeForRecClosing>
        <recordSequenceNumber>26784</recordSequenceNumber>
        <nodeID>1</nodeID>
        <localSequenceNumber>8858562</localSequenceNumber>
        <apnSelectionMode><mSorNetworkProvidedSubscriptionVerified/></apnSelectionMode>
        <servedMSISDN>91 02 98 99 00 81</servedMSISDN>
        <chargingCharacteristics>01 00</chargingCharacteristics>
        <chChSelectionMode><sGSNSupplied/></chChSelectionMode>
        <sgsnPLMNIdentifier>13 01 39</sgsnPLMNIdentifier>
        <servedIMEISV>53 97 04 40 81 57 80 00</servedIMEISV>
        <rATType>6</rATType>
        <userLocationInformation>01 13 01 39 01 86 BD 01</userLocationInformation>
        <listOfServiceData>
            <ChangeOfServiceCondition>
                <ratingGroup>1</ratingGroup>
                <localSequenceNumber>1</localSequenceNumber>
                <timeOfFirstUsage>14 02 04 04 09 40 2D 06 00</timeOfFirstUsage>
                <timeOfLastUsage>14 02 04 04 12 44 2D 06 00</timeOfLastUsage>
                <serviceConditionChange>
                    00000000000000000000000010000000
                </serviceConditionChange>
                <sgsn-Address>
                    <iPBinaryAddress>
                        <iPBinV4Address>AB CD 72 62</iPBinV4Address>
                    </iPBinaryAddress>
                </sgsn-Address>
                <sGSNPLMNIdentifier>13 01 39</sGSNPLMNIdentifier>
                <datavolumeFBCUplink>192323</datavolumeFBCUplink>
                <datavolumeFBCDownlink>320043</datavolumeFBCDownlink>
                <timeOfReport>14 02 04 04 12 46 2D 06 00</timeOfReport>
                <rATType>6</rATType>
                <userLocationInformation>01 13 01 39 01 86 BD 01</userLocationInformation>
            </ChangeOfServiceCondition>
        </listOfServiceData>
    </egsnPDPRecord>
</GPRSRecord>    

Upvotes: 0

Views: 133

Answers (3)

abiessu
abiessu

Reputation: 1927

A stateful loop in Perl could work pretty easily, with the caveat that much of the work done by an XML parser to handle multi-line entries, etc., would need to be duplicated here for any files that do not match the example text. Something like

my $infile;
open($infile, "MyCDRFile.nxm");

my %searches = {
  "rec_start" => "egsnPDPRecord",
  "imsi" => "servedIMSI",
  "ip" => "iPBinV4Address",
  "firsttime" => "timeOfFirstUsage",
  "lasttime" => "timeOfLastUsage"
};
my %finds;
my ($imsi,) = ("");

while (my $line = <$infile>) {
  chomp($line);

  if (index($line, $searches{"rec_start"}) > -1) {
    if ($imsi ne "") print "[$imsi, " + join(',', @finds{"ip", "firsttime", "lasttime"}) + "]\n";
    $imsi = "";
  }
  if (index($line, $searches{"imsi"}) > -1) {
    $imsi = (split($line, $searches{"imsi"}))[1];
    $imsi =~ s![<>/]!!g;
  }
  foreach my $search ("ip", "firsttime", "lasttime") {
    if ($imsi ne "" and index($line, $searches{$search}) > -1) {
      $finds{$search} = (split($line, $searches{$search}))[1];
      $finds{$search} =~ s![<>/]!!g;
    }
  }
}

close($infile);

Printing out to a separate file, reading from STDIN, etc. could all be added into this fairly easily.

Upvotes: 1

Borodin
Borodin

Reputation: 126722

This short Perl program processes a file called GPRSRecord.xml, which contains the data you show in your question, wrapped in a <root>...</root> element. It extracts the fields that you say you're interested in from every egsnPDPRecord element that it finds. Clearly, in this case there is only one.

use strict;
use warnings;

use XML::LibXML;

my $xml = XML::LibXML->load_xml(location => 'GPRSRecord.xml');

for my $pdp_rec ($xml->findnodes('/root/GPRSRecord/egsnPDPRecord')) {

  my ($imsi_address) = $pdp_rec->findnodes('servedIMSI');
  printf "%s: %s\n", $imsi_address->nodeName, $imsi_address->textContent;

  my ($ip_v4_address) = $pdp_rec->findnodes('servedPDPAddress/iPAddress/iPBinaryAddress/iPBinV4Address');
  printf "%s: %s\n", $ip_v4_address->nodeName, $ip_v4_address->textContent;

  my ($service_condition) = $pdp_rec->findnodes('listOfServiceData/ChangeOfServiceCondition');
  my ($first_usage)       = $service_condition->findnodes('timeOfFirstUsage');
  my ($last_usage)        = $service_condition->findnodes('timeOfLastUsage');
  printf "%s: %s\n", $first_usage->nodeName, $first_usage->textContent;
  printf "%s: %s\n", $last_usage->nodeName, $last_usage->textContent;

}

output

servedIMSI: 13 91 03 00 00 00 10 F8
iPBinV4Address: 0A 37 00 11
timeOfFirstUsage: 14 02 04 04 09 40 2D 06 00
timeOfLastUsage: 14 02 04 04 12 44 2D 06 00

Upvotes: 3

Sobrique
Sobrique

Reputation: 53478

XML parsers exist to parse well-formed XML. They will typically fail - often messily - if your XML is not well-formed.

Your XML seems to be well-formed though. So personally, I'd start with using XML::Twig as a personal favourite.

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

sub extractIMSI {
    my ( $twig, $servedIMSI ) = @_;
    print $servedIMSI -> text(),"\n";
    $twig -> purge(); #why I like XML::Twig - it lets you clear memory on the fly
}

my $parser = XML::Twig -> new ( twig_handlers => { 'servedIMSI' => \&extractIMSI } );

$parser -> parsefile ( 'test.xml' );

Works if the 'test.xml' contains your sample data, anyway.

Upvotes: 4

Related Questions