Reputation: 2364
I have a file that contains random, junk ascii characters.
However, in the file there is also a message written in english.
Like this:
...˜ÃÕ=òaãNÜ ß§#üxwáã MESSAGE HIDDEN IN HERE ŸÎ=N‰çÈ^XvU…”vN˜...
I'm trying to write a python regex which will look for a pattern beginning with 6 letters or spaces and ending with 6 letters of spaces.
that way, as long as the message is a minimum of characters or spaces long, then it should output the message.
This is what I've come up with, but it doesn't seem to be working.
regex = re.compile('''
([A-Z ]){6,}
([A-Z ]){6,}
''', re.I | re.X )
Upvotes: 1
Views: 10352
Reputation: 13693
Your Regex:
([A-Z ]){6,}
([A-Z ]){6,}
Doesn't work because, As you can see it expects quite a lot of spaces between the two groups:
Was this what you were looking for:
import re
reg = re.compile( "[A-Z ]{6,}[A-Z ]{6,}")
string = "...˜ÃÕ=òaãNÜ ß§#üxwáã MESSAGE HIDDEN IN HERE ŸÎ=N‰çÈ^XvU…”vN˜..."
print reg.findall(string)
Output:
[' MESSAGE HIDDEN IN HERE ']
Upvotes: 4
Reputation: 12874
Try the following regular expression. Using your example I only needed to check one group:
import re
pattern_obj = re.compile('[a-zA-Z ]{6,}', re.I)
extracted_patterns = pattern_obj.findall(ur'your_string')
print extracted_patterns
From your Stackoverflow tag - I assume that you use Python 2. In such a case you have to take care that the string read in is unicode.
Output
[u' MESSAGE HIDDEN IN HERE ']
General recommendation: Sometimes it can be difficult to find a good regular expression. The mostly unknown flag re.DEBUG
can be very useful in this case.
pattern_obj = re.compile('[a-zA-Z ]{6,}', re.DEBUG)
max_repeat 6 4294967295
in
range (97, 122)
range (65, 90)
literal 32
Upvotes: 2
Reputation: 9102
import re
word = re.compile('[a-zA-Z\s]{6,}.+[[a-zA-Z\s]{6,}]')
filein = open(filename, 'rb).read()
print re.findall(word, filein)
Upvotes: 0