Reputation: 717

Regular Expression Python

I have the following python code that will try to read an input file and find the following instances of the give regular expression:

[fF][eE][bB]([1-2][0-9]|[0-9]

I have written the following python code

#!/usr/bin/python
import re
import sys

textFile = open(sys.argv[1], 'r')
fileText = textFile.read()
textFile.close()
matches = re.findall("[fF][eE][bB] ([1-2][0-9]|[0-9])",fileText)
print matches

and my input file is :

1 2 3 the
the quick 2354
feb 1 
feb 0
feb -10
feb23
feb 29
feb 3
february 10

However when I run my code I get following output: ['1','29', '3']

I want my output to be more like ['feb 1', 'feb 29', 'feb 3']

I am not really sure what I'm doing wrong. Any help would be greatly appreciated.

Upvotes: 2

Answers (4)

e h

Reputation: 8903

How about:

(feb\s[1-9][0-9]?)

Matches: feb followed by a space and a digit between 1 and 9 and then any other optional digit between 0 and 9.

Try:

matches = re.findall(r'(feb\s[1-9][0-9]?)',fileText)
print matches
>>> ['feb 1', 'feb 29', 'feb 3']

Caveat: this won't solve for text like "feb 51"

See it in action

Upvotes: 0

Jerry

Reputation: 71598

You should read the documentation. re.findall returns only capture groups when there are in the expression. You should simply remove the capture group from your regex:

matches = re.findall("[fF][eE][bB] (?:[1-2][0-9]|[0-9])",fileText)
                                    ^^

That said, this regex will also match feb 0, so you might want to use

[fF][eE][bB] (?:[1-2][0-9]|[1-9])
                            ^

Instead.

Now, you can make the regex shorter if you use re.IGNORECASE (to make the regex match both uppercase and lowercase characters), and if you use a loop to read the file contents (this is more efficient for large files). Also, it's a good practice to raw your regex patterns:

with open(sys.argv[1], 'r') as textFile:
    for line in textFile:
        matches = re.match(r"feb (?:[1-2][0-9]|[1-9])", line, re.IGNORECASE)
        if matches:
            print matches.group()

And of course, you can put the matches in a list too if you need a list in the end.

Upvotes: 3

Sakeer

Reputation: 2006

If you want to print like this,

['feb 1', 'feb 29', 'feb 3']

You can use the below code,

    matches = re.findall("(feb (?:[12][0-9]|[1-9]))",fileText)
    print matches

Upvotes: 1

tripleee

Reputation: 189936

You need parentheses around the entire expression in order for matches to contain the matched text. There will be one match group for each pair of parens; you can use (feb (?:[12][0-9]|[1-9])) to have grouping without capturing for the second group.

However, given your examples, perhaps you actually want to print the entire input line when there is a match?

Upvotes: 1

Regular Expression Python

Answers (4)

Related Questions