Luke
Luke

Reputation: 467

Extract certain part of a string in Perl

I have the following Perl strings. The lengths and the patterns are different. The file is always named *log.999

my $file1 = '/user/mike/desktop/sys/syslog.1';
my $file2 = '/user/mike/desktop/movie/dnslog.2';
my $file3 = '/haselog.3';
my $file4 = '/user/mike/desktop/movie/dns-sys.log'

I need to extract the words before log. In this case, sys, dns, hase and dns-sys.

How can I write a regular expression to extract them?

Upvotes: 0

Views: 2202

Answers (2)

zdim
zdim

Reputation: 66873

The main property of shown strings is that the *log* phrase is last.

Then anchor the pattern, so we wouldn't match a log somewhere in the middle

my ($name) = $string =~ /(\w+)log\.[0-9]+$/;

while if .N extension is optional

my ($name) = $string =~ /(\w+)log(?:\.[0-9]+)?$/;

The above uses the \w+ pattern to capture the text preceding log. But that text may also contain non-word characters (-, ., etc), in which case we would use [^/]+ to capture everything after the last /, as pointed out in Abigail's answer. With .N optional, per question in the comments

my ($name) = $string =~ m{ ([^/]+) log (?: \.[0-9]+ )? $}x;

where I added the }x modifier, with which spaces inside are ignored, what can aid readibility.

I use a set of delimiters other than / to be able to use / inside without escaping it, and then the m is compulsory. The [^...] is a negated character class, matching any character not listed inside. So [^/]+log matches all successive characters which are not /, coming before log.

The non capturing group (?: ... ) groups patterns inside, so that ? applies to the whole group, but doesn't needlessly capture them.

The (?:\.[0-9]+)? pattern was written specifically so to disallow things like log. (nothing after dot) and log5. But if these are acceptable, change it to the simpler \.?[0-9]*

Update   Corrected a typo in code: for optional .N there is +, not *

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336088

\w+(?=log\b)

matches one or more alphanumeric characters that are followed by log (but not logging etc.)

If the filename format is fixed, you can make the regex more reliable by using

\w+(?=log\.\d+\/$)

Upvotes: 2

Related Questions