Reputation: 2136
I have a large xml file that is within a logging statement. I am using Splunk to extract values from the xml file. I have to use regex to find these values because I cannot change the config files. I requested the change but it is pending... This is an example of the xml file:
<?xml version="1.0" encoding="UTF-8"?>
<tmsTrip xmlns="http://ground.fedex.com/schemas/linehaul/trip" xmlns:ns2="http://ground.fedex.com/schemas/linehaul/TMSCommon">
<tripNumber>129271010</tripNumber>
<tripLegNumber>1</tripLegNumber>
<origin>
<ns2:numberCode>5902</ns2:numberCode>
...many more fields....
</origin>
<destination>
<ns2:numberCode>5087</ns2:numberCode>
...many more fields....
</destination>
...many more fields....
<purchasedCost>
<purchasedCostTripSegment>
<purchCostReference>2644025</purchCostReference>
<carrier>BNSF</carrier>
<vendorType>RAIL</vendorType>
<carrierTrailerType>53PC</carrierTrailerType>
<origin>
<ns2:numberCode>4022</ns2:numberCode>
...many more fields....
</origin>
<destination>
<ns2:numberCode>4040</ns2:numberCode>
...many more fields....
<stopOff>
<ns2:stopOffLocation>
<ns2:numberCode>9996</ns2:numberCode>
...many more fields....
</ns2:stopOffLocation>
</stopOff>
<schedDispatchDate>2020-05-27T05:00:00.000Z</schedDispatchDate>
...many more fields....
</purchasedCostTripSegment>
<purchasedCostTripSegment>
<purchCostReference>2644025</purchCostReference>
<carrier>NS</carrier>
<vendorType>RAIL</vendorType>
<carrierTrailerType>53PC</carrierTrailerType>
<origin>
<ns2:numberCode>4061</ns2:numberCode>
...many more fields....
</origin>
<destination>
<ns2:numberCode>4040</ns2:numberCode>
...many more fields....
</destination>
<stopOff>
<ns2:stopOffLocation>
<ns2:numberCode>4040</ns2:numberCode>
...many more fields....
</ns2:stopOffLocation>
</stopOff>
<schedDispatchDate>2020-05-27T05:00:00.000Z</schedDispatchDate>
...many more fields....
</purchasedCostTripSegment>
</purchasedCost>
</tmsTrip>
I need to identify the ns2:numberCode for the origin and destination for each of the purchasedCostTripSegment.
I am doing this in Splunk so the regex might be particular to Splunk. I am able to get find the origins and destinations if I use the function mvindex() and count the instance of the ns2:numberCode. But then they are individual fields and do not display clearly in a table. This is the regex command that will return the first origin of a PurchaseCostTripSegment:
| rex max_match=0 "\<ns2\:numberCode\>(?P<location>[^\<]+)" | eval Segment1_Origin = mvindex(location, 7)
I need a regex that will return all of the origins of the PurchaseCostTripSegments I tried this:
| rex max_match=0 "\<purchasedCostTripSegment\>*\<origin\>*\<ns2\:numberCode\>(?P<Origin>[^\<]+)"
It returned no value. How can I write the regex to find all of the ns2:numberCode values that are in this section of the xml:
<purchasedCostTripSegment>
<purchCostReference>2644025</purchCostReference>
<carrier>BNSF</carrier>
<vendorType>RAIL</vendorType>
<carrierTrailerType>53PC</carrierTrailerType>
<origin>
<ns2:numberCode>4022</ns2:numberCode>
</purchasedCostTripSegment>
<purchasedCostTripSegment>
<purchCostReference>2644025</purchCostReference>
<carrier>NS</carrier>
<vendorType>RAIL</vendorType>
<carrierTrailerType>53PC</carrierTrailerType>
<origin>
<ns2:numberCode>4061</ns2:numberCode>
</purchasedCostTripSegment>
In the above instance, I want to return values, 4022 and 4061,
Upvotes: 1
Views: 777
Reputation: 627292
You can use this as a temporary workaround:
| rex max_match=0 "<purchasedCostTripSegment>[\s\S]*?<origin>\s*<ns2:numberCode>(?P<Origin>\d+)"
See the regex demo.
Details
<purchasedCostTripSegment>
- some literal text[\s\S]*?
- zero or more chars, as few as possible<origin>
- some text\s*
- 0+ whitespace chars<ns2:numberCode>
- some text(?P<Origin>\d+)
- Named capturing group (for Splunk, it must be a named group): 1 or more digits.Upvotes: 1