Extraction of a name from a string with variable number of separators

Question

I would like to extract part of a foldername which as the form 123456_Letters_CAPITAL_name_extension.

The name can be LETTERS123, LETTERS_123_LETTERS or LETTERS_LETTERS_123.

Currently, I'm extracting the name with unlist(strsplit(foldername , sep="_"))[4,length(unlist(strsplit(foldername , sep="_")))-1]

But I would like to be able to extract it if the _CAPITAL part is not present (it would be 3 instead of 4 but I would like to have a general way of doing it).

130615_Screen_II_SN_KB_3_lxb/, 130615_Screen_II_AL343_lxb/, 130615_Screen_II_HK_344_LM_lxb/ are representative examples of complete foldername

I tried but could not figure any regex that would do that. Any idea would be helpful.

Toto · Accepted Answer

How about this one:

^\d+_[a-zA-Z]+_(?:[A-Z]+_)?([A-Z]+\w+)_[^_]+$

The name will be in group 1.

A perl way to test it:

my $re = qr~^\d+_[a-zA-Z]+_(?:[A-Z]+_)?([A-Z]+\w+)_[^_]+$~;
while() {
    chomp;
    say $1 if /$re/;
}
__DATA__
130615_Screen_II_SN_KB_3_lxb/
130615_Screen_II_AL343_lxb/
130615_Screen_II_HK_344_LM_lxb/
130615_Screen_HK_344_LM_lxb/

output:

SN_KB_3
AL343
HK_344_LM
HK_344_LM

Explanation:

The regular expression:

^\d+_[a-zA-Z]+_(?:[A-Z]+_)?([A-Z]+\w+)_[^_]+$

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
----------------------------------------------------------------------
  _                        '_'
----------------------------------------------------------------------
  [a-zA-Z]+                any character of: 'a' to 'z', 'A' to 'Z'
                           (1 or more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  _                        '_'
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    _                        '_'
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  _                        '_'
----------------------------------------------------------------------
  [^_]+                    any character except: '_' (1 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  $                        before an optional 
, and the end of the
                           string
----------------------------------------------------------------------

Extraction of a name from a string with variable number of separators

Answers (2)

Related Questions