Jesse Brands
Jesse Brands

Reputation: 2877

String Parsing in PHP

For a small project of my own, I'm writing a parser that parses event logs from a certain application. Normally I'd have little issue with handling such a thing, but the problem is that strings from these logs do not always have the same parameters. For example, one such string could be:

DD/MM HH:MM:SS.MSEC TYPE_OF_EVENT SOURCE, SOURCE_FLAGS, TARGET, TARGET_FLAGS, PARAM1

On another occasion, the string could have a series of parameters, all the way up to 27 of them, the other has 16. Reading through the documentation, there is some logic in the parameters, for example, the 17th Parameters will always hold an integer. While that is good, unfortunately the 17th parameter might be the 7th thing on the string. The only thing that is really constant on every string is the time stamp and the 6th first parameters.

How would I go around parsing strings like these? I'm sorry if my question is a tad unclear, I find it difficult to word my problem.

Upvotes: 0

Views: 340

Answers (4)

Marc B
Marc B

Reputation: 360572

Ok, followup for my comment up at the top.

If the log's format is "constant" based on the TYPE_OF_EVENT field, you'll just have to do some simple pre-parsing, after which the rest should follow easily.

  1. read a line
  2. extract the universally common fields: timestamp, type of event, source/target
  3. based on type_of_event, do further analysis

    switch (event type) {
    case 'a': parse out 'a' event parameters
    case 'b': parse out 'b' event parameters
    default: log unknown event type for future analysis
    }

and so on.

Upvotes: 1

mario
mario

Reputation: 145482

That's not an input that can be "parsed" as such, because there are no fixed keywords to look out for. But regular expressions seem sufficient to extract and split up the contents.

http://regular-expressions.info/ has a good introduction, and https://stackoverflow.com/questions/89718/is-there-anything-like-regexbuddy-in-the-open-source-world lists a few cool tools that help in designing regular expressions.

In your case you would need \d+ for matching decimals, use delimiters literally, und you probably can get away with .*? separated by the , comma delimiters to find the individual parts. Maybe:

preg_match('#(\d+/\d+) (\d+:\d+:\d+.\d+) (\w+) (.*?),(.*),(.*),...#');

If there is a variable length of attributes, then you should prefer two regexps (though it can be done in one). First get the .* remainder of each line, then split it afterwards.

Upvotes: 1

Cristian Radu
Cristian Radu

Reputation: 8412

How about splitting the string by the ", " separator and putting everything in an array. That way you'll have a numeric index to check if a parameter exists or not.

Upvotes: 0

Payson Welch
Payson Welch

Reputation: 1428

I would use a different logging solution, or find a way to modify it so that you have empty place holders, item,,item3,,,item6 etc.

Just my opinion without knowing too much about this app - this app doesn't sound too good. I usually judge apps by factors like this, if there is not a good reason for the log file to be non-standardized then what do you think the rest of the code look like? :)

Upvotes: 1

Related Questions