dzm
dzm

Reputation: 23534

Regex for extracting S'#'E'#'

We have a bunch of files that are named like, 'My.File.S01E01.something.something' or 'My.File.S01E02.another.something' or 'My_File.S1E1.something'.

What I'm trying to do is extract the text before the S01E01 or S1E1 (so for example, the above would return My.File. or My_File. and also extract the values for S and E, being S01 (S = 01) or S1 (S = 1) etc.

Would anyone know how I can do this? or point me in the right direction, I'm really not sure.

Upvotes: 0

Views: 254

Answers (3)

4m1r
4m1r

Reputation: 12542

This is more of an algorithm than a simple regex.
http://jsfiddle.net/GqJPU/

var file1 = 'My.File.S01E01.something.something';
var file2 = 'My.File.S01E02.another.something';
var file3 = 'My_File.S1E1.something';

var myFiles = [file1, file2, file3];

myFiles.forEach(function(e, i){
      var myReturn = returnFileDefs(e);
      console.log(myReturn);
});

function returnFileDefs(element){
    var myResult = {};
    var mySplit = element.split('\.');
    var front = mySplit[0];
    var re = /_/g;      

    if(re.test(front)){
        myResult.front = front;
        myResult.middle = mySplit[1];
    }else{
        myResult.front = mySplit[0]+'.'+mySplit[1];
        myResult.middle = mySplit[2];
    }
    myResult.S = myResult.middle.match(/(S)([0-9]+)/)[2];
    myResult.E = myResult.middle.match(/(E)([0-9]+)/)[2];
    return myResult;
}

Upvotes: 0

Reactgular
Reactgular

Reputation: 54761

You can match all the text before S## and not worry about matching specifically S##E##. The key is to check make sure there is no number or letter before the S. For example you would want to ignore filenames02 as being invalid.

Also, some torrents contain spam at the front of the name. Such as [torrent.com]My.File.S01E01.zip you can skip that as well (if you want).

The regex to match just the filename is /^(\[.+\])?(.+[^a-z0-9])(?=S\d)/i. If you want to include the spam, then /^(.+[^a-z0-9])(?=S\d)/i is all you need.

var names = [
    "My.File.S1E1.something",
    "My File S01E01.something",
    "[--spam--] My.File.S01E01.something",
    "My.Files S01 something",

    // these won't match
    "My.FilesS01E01.something"
];
for(var i=0; i < names.length; i++)
{
    var name = names[i];
    name = name.match(/^(\[.+\])?(.+[^a-z0-9])(?=S\d)/i);
    $('body').append('<div>'+name[2]+'</div>');
}

http://jsfiddle.net/thinkingmedia/fZDM4/

Upvotes: 1

Eduardo Cuomo
Eduardo Cuomo

Reputation: 18937

Try:

var file = 'My.File.S01E02.something.something';
var S = file.match(/(?:\.S)([0-9]+)(?:E[0-9]+\.)/)[1];
var E = file.match(/(?:\.S[0-9]+E)([0-9]+)(?:\.)/)[1];

Next:

S === "01"
E === "02"

Upvotes: 0

Related Questions