Jirka Kopřiva
Jirka Kopřiva

Reputation: 3099

php regex - clean file names

I'm looking for elegant regular expression to clean brackets with content looks like file name.

[Nibh justo] elit Nulla [link.pdf]  auctor ipsum molestie (link.pdf) 
Condimentum euismod non [link.xls](link.xls) [link.doc](link.doc) tempus 
In [Curabitur] et

The result should be:

Nibh justo elit Nulla auctor ipsum molestie Condimentum euismod 
non tempus In Curabitur et

I trust there must be a short way do that. (File means simply - dot included. No sentences check is necessary.)

thank for help

Upvotes: 3

Views: 532

Answers (2)

Cylian
Cylian

Reputation: 11181

Try this:

(?:[\[\(]\w+\.\w+[\]\)])|(?:[\[\(](?=[0-9A-Za-z]))|(?:(?<=[0-9A-Za-z])[\]\)])

$result = preg_replace('/(?:[[(]\w+\.\w+[\])])|(?:[[(](?=[0-9A-Za-z]))|(?:(?<=[0-9A-Za-z])[\])])/m', '', $subject);

Explanation:

    <!--
(?:[\[\(]\w+\.\w+[\]\)])|(?:[\[\(](?=[0-9A-Za-z]))|(?:(?<=[0-9A-Za-z])[\]\)])

Options: ^ and $ match at line breaks

Match either the regular expression below (attempting the next alternative only if this one fails) «(?:[\[\(]\w+\.\w+[\]\)])»
   Match the regular expression below «(?:[\[\(]\w+\.\w+[\]\)])»
      Match a single character present in the list below «[\[\(]»
         A [ character «\[»
         A ( character «\(»
      Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
         Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      Match the character “.” literally «\.»
      Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
         Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      Match a single character present in the list below «[\]\)]»
         A ] character «\]»
         A ) character «\)»
Or match regular expression number 2 below (attempting the next alternative only if this one fails) «(?:[\[\(](?=[0-9A-Za-z]))»
   Match the regular expression below «(?:[\[\(](?=[0-9A-Za-z]))»
      Match a single character present in the list below «[\[\(]»
         A [ character «\[»
         A ( character «\(»
      Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=[0-9A-Za-z])»
         Match a single character present in the list below «[0-9A-Za-z]»
            A character in the range between “0” and “9” «0-9»
            A character in the range between “A” and “Z” «A-Z»
            A character in the range between “a” and “z” «a-z»
Or match regular expression number 3 below (the entire match attempt fails if this one fails to match) «(?:(?<=[0-9A-Za-z])[\]\)])»
   Match the regular expression below «(?:(?<=[0-9A-Za-z])[\]\)])»
      Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=[0-9A-Za-z])»
         Match a single character present in the list below «[0-9A-Za-z]»
            A character in the range between “0” and “9” «0-9»
            A character in the range between “A” and “Z” «A-Z»
            A character in the range between “a” and “z” «a-z»
      Match a single character present in the list below «[\]\)]»
         A ] character «\]»
         A ) character «\)»
-->

when the above RegEx applied to :

[Nibh justo] elit Nulla [link.pdf]  auctor ipsum molestie (link.pdf) 
Condimentum euismod non [link.xls](link.xls) [link.doc](link.doc) tempus 
In [Curabitur] et

produces required result:

Nibh justo elit Nulla   auctor ipsum molestie  
Condimentum euismod non   tempus 
In Curabitur et

Upvotes: 2

Julien
Julien

Reputation: 1296

Something like this ?

$str = '[Nibh justo] elit Nulla [link.pdf]  auctor ipsum molestie (link.pdf) 
Condimentum euismod non [link.xls](link.xls) [link.doc](link.doc) tempus In 
[Curabitur] and other [./beta/link.pfd]';

$str = preg_replace('`(\(|\[)[\w/\.-]+\.[a-z]+(\)|\])`i', '', $str);
$str = str_replace(array('[', ']'), '', $str);

echo $str;

Result is :

Nibh justo elit Nulla auctor ipsum molestie Condimentum euismod 
non tempus In Curabitur and other

Upvotes: 3

Related Questions