Reputation: 467
I have the following Perl strings. The lengths and the patterns are different. The file is always named *log.999
my $file1 = '/user/mike/desktop/sys/syslog.1';
my $file2 = '/user/mike/desktop/movie/dnslog.2';
my $file3 = '/haselog.3';
my $file4 = '/user/mike/desktop/movie/dns-sys.log'
I need to extract the words before log
. In this case, sys
, dns
, hase
and dns-sys
.
How can I write a regular expression to extract them?
Upvotes: 0
Views: 2202
Reputation: 66873
The main property of shown strings is that the *log*
phrase is last.
Then anchor the pattern, so we wouldn't match a log
somewhere in the middle
my ($name) = $string =~ /(\w+)log\.[0-9]+$/;
while if .N
extension is optional
my ($name) = $string =~ /(\w+)log(?:\.[0-9]+)?$/;
The above uses the \w+
pattern to capture the text preceding log
. But that text may also contain non-word characters (-
, .
, etc), in which case we would use [^/]+
to capture everything after the last /
, as pointed out in Abigail's answer. With .N
optional, per question in the comments
my ($name) = $string =~ m{ ([^/]+) log (?: \.[0-9]+ )? $}x;
where I added the }x
modifier, with which spaces inside are ignored, what can aid readibility.
I use a set of delimiters other than /
to be able to use /
inside without escaping it, and then the m
is compulsory. The [^...]
is a negated character class, matching any character not listed inside. So [^/]+log
matches all successive characters which are not /
, coming before log
.
The non capturing group (?: ... )
groups patterns inside, so that ?
applies to the whole group, but doesn't needlessly capture them.
The (?:\.[0-9]+)?
pattern was written specifically so to disallow things like log.
(nothing after dot) and log5
. But if these are acceptable, change it to the simpler \.?[0-9]*
Update Corrected a typo in code: for optional .N
there is +
, not *
Upvotes: 1
Reputation: 336088
\w+(?=log\b)
matches one or more alphanumeric characters that are followed by log
(but not logging
etc.)
If the filename format is fixed, you can make the regex more reliable by using
\w+(?=log\.\d+\/$)
Upvotes: 2