Reputation: 5895
I need a GLOB2 or FORMIC like solution to search a large list of directories in a text file (the files aren't on my machine, the file list is produced by an external process i cannot directly access or query)
pseudo example:
# read the large directory list in memory
data = []
with open('C:\\log_file.txt','r') as log:
data = log.readlines()
# query away!
query1 = listglob(data,'/**/fnord/*/log.*')
query2 = listglob(data,'/usr/*/model_*/fnord/**')
Unless someone has a suggestion, my next step is to open up glob2 and formic and see if one of them can be changed to accept a list instead of a root folder to be "os.walked"
Upvotes: 2
Views: 1468
Reputation: 15
I dont think the glob2.fnmatch.fnmatch
is equivalent to the glob2 **
syntax.
It is equivalent to the fnmatch
syntax from what i can tell from reading the source code.
Also Andrew's answer doesn't cover the square brackets. and the [!abc]
example
Upvotes: 0
Reputation: 5895
In the end i used one of glob2's functions, like so:
import glob2
def listglob(data,pattern):
return [x for x in items if glob2.fnmatch.fnmatch(x,pattern)]
Upvotes: 1
Reputation: 19651
I would recommend using regular expressions. Ultimately, both Formic and glob
use an OS call to perform the actual glob matching. So, if you want to modify either, you're going to have to write a RE matcher (or similar) in any case. So, cut out the middle-man and go straight to REs. (It pains me to say that because I'm the author of Formic).
The basic plan is to write a function that takes in your glob and returns a regular expression. Here are some pointers:
.
, -
and other RE reserved characters in your globs. Eg .
becomes \.
?
in a glob file/directory becomes [^/]
(matches a single character that's not a /
)*
in a glob file/directory name as a regular expression is [^/]*
/*/
glob as a regular expression is: /[^/]+/
/**/
glob as a regular expression is: /([^/]+/)*
^
and end it with $
. This forces the RE to expand over the whole string.While I listed the substitutions in order of increasing complexity, it's probably a good idea to do the substitutions in the following order:
.
, -
, '$', etc)?
/**/
/*/
*
This way you won't corrupt the /**/
when substituting for a single *
.
In your question you have: /**/fnord/*/log.*
. This would map to:
^/([^/]+/)*fnord/[^/]+/log\.[^/]*
Once you've built your RE, then finding matches is a simple exercise.
Upvotes: 2