Lester Kahn
Lester Kahn

Reputation: 31

Stuck on perl regex expression for string with ending white space

Following is a line from an ftp log:

2013-03-05 18:37:31 543.21.12.22 []sent /home/mydomain/public_html/court-9746hd/Chairman-confidential-video.mpeg 226 [email protected] 256

I am using a program called Simple Event Correlate which pulls values from inside the parenthesis of a regex expression and sets those values to a variable.

So, here is an entry in a SEC config file which is supposed to operate on the previous log file line:

    pattern=sent \/home\/mydomain\/public_html\/(.*)\/(.*)

This succeeds in pulling out the logged in user, court-9746hd, and setting it to a variable, but fails to properly extract the file name downloaded, or, Chairman-confidential-video.mpeg

Instead, it pulls out the file downloaded as: Chairman-confidential-video.mpeg 226 [email protected] 256

So you see, I'm having difficulty getting the second extraction to stop at the first white space after the file name. I've tried:

    pattern=sent \/home\/mydomain\/public_html\/(.*)\/(.*)\s

but I only get the same result. Any help would be greatly appreciated.

Upvotes: 0

Views: 205

Answers (3)

Kenosis
Kenosis

Reputation: 6204

One option is to first capture the full path from the line, and then use File::Spec to get the user and file info:

use strict;
use warnings;
use File::Spec;

my $line = '2013-03-05 18:37:31 543.21.12.22 []sent /home/mydomain/public_html/court-9746hd/Chairman-confidential-video.mpeg 226 [email protected] 256';
my ( $path ) = $line =~ m!\s+(/home\S+)\s+!;
my ( $user, $file ) = ( File::Spec->splitdir($path) )[ -2, -1 ];

print "User: $user\nFile: $file";

Output:

User: court-9746hd
File: Chairman-confidential-video.mpeg

However, if you want to only use a regex, the following will work:

m!/home/.+/.+/([^/]+)/(\S+)!

Upvotes: 0

Devin Ceartas
Devin Ceartas

Reputation: 4829

Rather than using the .* construct, use something narrower in scope, as a general rule. In this case what you want is something which is not a white space, so say that explicitly:

pattern=sent \/home\/mydomain\/public_html\/([^\s]+)\/([^\s]+)

Upvotes: 0

ChrisF
ChrisF

Reputation: 180

If you only want to match non-whitespace, replace .* with \S* or if space is the only character you want to exclude then use [^ ]* instead.

Also, man perlre is a good reference.

Upvotes: 2

Related Questions