Adam
Adam

Reputation: 1309

Extract username and message from string?

So I'm trying to extract specific data from a log file such as date, username and the message itself.

This in an mock up of how the file looks like:

[2017-03-14 11:48:22] Steve T: Hi!
[2017-03-14 11:49:01] Oscar: Hi! :D How are u doin?
[2017-03-14 11:50:24] Steve T: Im doing great :P

I can extract the date with preg_match("/(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})/", $string, $matches), but how do I fetch the username and the message with regex?

Upvotes: 1

Views: 230

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89584

An alternative approach that uses a formatted string.

$str = <<<'EOD'
[2017-03-14 11:48:22] Steve T: Hi!
[2017-03-14 11:49:01] Oscar: Hi! :D How are u doin?
[2017-03-14 11:50:24] Steve T: Im doing great :P
EOD;

$handle = fopen("data://text/plain,$str", 'r');
while ( false !== $line = fgets($handle) ) {
    print_r(sscanf($line, "[%[^]]] %[^:]: %[^\1]"));
}

demo

Upvotes: 1

Christian
Christian

Reputation: 28144

Here's a working regex:

/^\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] ([\w\s]+): (.+)$/gm

And you can see a demo here: https://regexr.com/3ntg7

It translates to:

  • ^ - start of line
  • \[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] - the date inside square brackets (must be escaped)
  • ([\w\s]+) - the user name (a mix of word (\w) and space (\s) characters)
    • if the usernames can contain any character except colons, you can also use: ([^:]+)
  • : - colon after username (match is discarded)
  • (.+) - match everything else
  • $ - end of line

And here's a PHP demo: https://3v4l.org/ovrt6

Caveats:

  • be careful about the username format, right now I assumed it only contain word and space characters
  • if messages can contain lines, then the regex need to be adjusted

Upvotes: 4

Related Questions