Reputation: 717
I have the following python code that will try to read an input file and find the following instances of the give regular expression:
[fF][eE][bB]([1-2][0-9]|[0-9]
I have written the following python code
#!/usr/bin/python
import re
import sys
textFile = open(sys.argv[1], 'r')
fileText = textFile.read()
textFile.close()
matches = re.findall("[fF][eE][bB] ([1-2][0-9]|[0-9])",fileText)
print matches
and my input file is :
1 2 3 the
the quick 2354
feb 1
feb 0
feb -10
feb23
feb 29
feb 3
february 10
However when I run my code I get following output: ['1','29', '3']
I want my output to be more like ['feb 1', 'feb 29', 'feb 3']
I am not really sure what I'm doing wrong. Any help would be greatly appreciated.
Upvotes: 2
Views: 240
Reputation: 8903
How about:
(feb\s[1-9][0-9]?)
Matches: feb followed by a space and a digit between 1 and 9 and then any other optional digit between 0 and 9.
Try:
matches = re.findall(r'(feb\s[1-9][0-9]?)',fileText)
print matches
>>> ['feb 1', 'feb 29', 'feb 3']
Caveat: this won't solve for text like "feb 51"
Upvotes: 0
Reputation: 71598
You should read the documentation. re.findall
returns only capture groups when there are in the expression. You should simply remove the capture group from your regex:
matches = re.findall("[fF][eE][bB] (?:[1-2][0-9]|[0-9])",fileText)
^^
That said, this regex will also match feb 0
, so you might want to use
[fF][eE][bB] (?:[1-2][0-9]|[1-9])
^
Instead.
Now, you can make the regex shorter if you use re.IGNORECASE
(to make the regex match both uppercase and lowercase characters), and if you use a loop to read the file contents (this is more efficient for large files). Also, it's a good practice to raw your regex patterns:
with open(sys.argv[1], 'r') as textFile:
for line in textFile:
matches = re.match(r"feb (?:[1-2][0-9]|[1-9])", line, re.IGNORECASE)
if matches:
print matches.group()
And of course, you can put the matches in a list too if you need a list in the end.
Upvotes: 3
Reputation: 2006
If you want to print like this,
['feb 1', 'feb 29', 'feb 3']
You can use the below code,
matches = re.findall("(feb (?:[12][0-9]|[1-9]))",fileText)
print matches
Upvotes: 1
Reputation: 189936
You need parentheses around the entire expression in order for matches
to contain the matched text. There will be one match group for each pair of parens; you can use (feb (?:[12][0-9]|[1-9]))
to have grouping without capturing for the second group.
However, given your examples, perhaps you actually want to print the entire input line when there is a match?
Upvotes: 1