Reputation: 6510
This is an example that searches PDF files in the current directory.
import os, os.path
import re
def print_pdf (arg, dir, files):
for file in files:
path = os.path.join(dir, file)
path = os.path.normcase(path)
if re.search(r".*\.pdf", path):
print path
os.path.walk('.', print_pdf, 0)
Could anyone explain what r".*\.pdf"
means here?
Why ".*\"
?
Thanks!
Upvotes: 0
Views: 428
Reputation: 342443
use os.walk() instead. And there's no need to use regex.
for r,d,f in os.walk(path):
for files in f:
if files[-4:].lower() == ".pdf":
print "found pdf: ",os.path.join(r,files)
Upvotes: 0
Reputation: 319621
it means any character zero or more times, followed by the literal dot and letters pdf (due to the greedy nature of the asterisk, it's basically guaranteed that the '.pdf'
are going to be at the end of the subject string).
There is glob
module to do this the right way:
>>> glob.glob(os.path.join(dirname, '*.pdf'))
Upvotes: 8
Reputation: 326
The period (.)
will match any character except newlines
The following asterisk (*)
means unlimited number of repetitions
of the preceding period
The backslash ()
escapes the period in .pdf So it looks for a real
period, so ONLY the .pdf and not "any
character".pdf again'
So in the end it looks for Any piece of text that ends in .pdf
Upvotes: 0
Reputation: 545618
Why
".*\"
?
Wrong question, you missed a crucial character of the expression. ;-)
In fact, .*
will match any character (.
in regex), as many times as possible (*
in regex; relates to the previous string, so .
in this case).
\.
, on the other hand, will match exactly one dot (.
). \
escapes the following character (.
) so it does no longer have its special meaning (e.g. in this case “match any character”) but rather it will be treated as-is.
Upvotes: 3
Reputation: 1072
This searches for a string containing zero or more chars followed by ".pdf" The .* is a common idiom in regexps and it means match any char 0 or more times. The . is because in regexps, the . has a special meaning, and the \ escapes that.
Upvotes: 1
Reputation: 798744
The .
means match any character but "\n". The *
means "repeat the previous character 0 or more times". The \.
matches an actual ".".
BTW, this is all in the docs.
Upvotes: 2