Reputation: 24921
I have a text file that looks like this:
[22/Nov/2011 12:57:58] "GET /media/js/jquery-1.4.3.min.js HTTP/1.1" 304 0
[22/Nov/2011 12:57:58] "GET /media/js/fancybox/fancybox-x.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/fancybox-y.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/blank.gif HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /ajax/pages/erlebnisse/ HTTP/1.1" 200 563
[22/Nov/2011 12:58:00] "GET /erlebnisse/alle-erlebnisse/ HTTP/1.1" 200 17114
I want to use regular expressions to get all the image files (.gif, .jpg, .png) that appear here. So the result from the text above should be:
['fancybox-x.png', 'fancybox-y.png', 'blank.gif']
What I did was:
re.findall('\w+\.(jpg|gif|png)', f.read())
So the pattern is:
1 or more word-characters
(\w+)
followed by a dot(\.)
and then 'jpg', 'gif' or 'png'(jpg|gif|png)
.
This actually works, but confuses the content of the parentheses (which I'm using only for "grouping") as a group(1)
, so the result is:
['png', 'png', 'gif']
With is right, but incomplete. In other words, I'm asking, how can I make re.findall()
distinguish between "grouping" parentheses and parentheses to assign groups?
Upvotes: 2
Views: 7190
Reputation: 42490
You're looking for non-capturing version of regular parentheses (?:...)
. The description is available in the re module docs.
s ='''[22/Nov/2011 12:57:58] "GET /media/js/jquery-1.4.3.min.js HTTP/1.1" 304 0
[22/Nov/2011 12:57:58] "GET /media/js/fancybox/fancybox-x.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/fancybox-y.png HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /media/js/fancybox/blank.gif HTTP/1.1" 304 0
[22/Nov/2011 12:57:59] "GET /ajax/pages/erlebnisse/ HTTP/1.1" 200 563
[22/Nov/2011 12:58:00] "GET /erlebnisse/alle-erlebnisse/ HTTP/1.1" 200 17114'''
import re
for m in re.findall('([-\w]+\.(?:jpg|gif|png))', s):
print m
Upvotes: 3
Reputation: 1705
You can just add another pair of parentheses, and put ?: for the inner one
re.findall('/([^/]+\.(?:jpg|gif|png))', f.read())
Note that \w
won't match "-", so I would suggest [^/]+
Upvotes: 3
Reputation: 9927
If you're looking for the entire match you should be able to find it in group 0, otherwise you can add extra parentheses if you're looking for another part of the string.
Upvotes: 0