Micke
Micke

Reputation: 2406

Regular expression to match only the first file in a RAR file set

To see what file to invoke the unrar command on, one needs to determine which file is the first in the file set.

Here are some sample file names, of which - naturally - only the first group should be matched:

yes.rar
yes.part1.rar
yes.part01.rar
yes.part001.rar

no.part2.rar
no.part02.rar
no.part002.rar
no.part011.rar

One (limited) way to do it with PCRE compatible regexps is this:

.*(?:(?<!part\d\d\d|part\d\d|\d)\.rar|\.part0*1\.rar)

This did not work in Ruby when I tested it at Rejax however.

How would you write one Ruby compatible regular expression to match only the first file in a set of RAR files?

Upvotes: 1

Views: 2159

Answers (4)

Welbog
Welbog

Reputation: 60418

Don't rely on the names of the files to determine which one is first. You're going to end up finding an edge case where you get the wrong file.

RAR's headers will tell you which file is the first on in the volume, assuming they were created in a somewhat-recent version of RAR.

HEAD_FLAGS Bit flags:
2 bytes

0x0100 - First volume (set only by RAR 3.0 and later)

So open up each file and examine the RAR headers, looking specifically for the flag that indicates which file is the first volume. This will never fail, as long as the archive isn't corrupt. I have done my own tests with spanning RAR archives and their headers are correct according to the link above.

This is a much, much safer way of determining which file is first in a set like this.

Upvotes: 4

Matthew Encinas
Matthew Encinas

Reputation: 147

I am no regex expert but here is my attempt

^(yes|no)\.(rar|part0*1\.rar)$

Replace "yes|no" with the actual file name. I matched it against your examples to see if it would only match the first set hence the "yes|no" in the regex.

UPDATE: fixed as per the comment. Not sure why the user would not know the filename so i did not fix that part...

Upvotes: 0

bmdhacks
bmdhacks

Reputation: 16411

The short answer is that it's not possible to construct a single regex to satisfy your problem. Ruby 1.8 does not have lookaround assertions (the (?<! stuff in your example regex) which is why your regex doesn't work. This leaves you with two options.

1) Use more than one regex to do it.

def is_first_rar(filename)
    if ((filename =~ /part(\d+)\.rar$/) == nil)
        return (filename =~ /\.rar$/) != nil
    else
        return $1.to_i == 1
    end
end

2) Use the regex engine for ruby 1.9, Oniguruma. It supports lookaround assertions, and you can install it as a gem for ruby 1.8. After that, you can do something like this:

def is_first_rar(filename)
    reg = Oniguruma::ORegexp.new('.*(?:(?<!part\d\d\d|part\d\d|\d)\.rar|\.part0*1\.rar)')
    match = reg.match(filename)
    return match != nil
end

Upvotes: 3

mweerden
mweerden

Reputation: 14051

Personally I wouldn't use (extended) regular expressions in this case (or at least not just one to do it all). What's wrong with coding this in, for example, a few ifs?

Upvotes: 0

Related Questions