Reputation: 241
I Am trying to find and print all the Phone numbers in this file. But the file got a lot of unreadable text. The file looks like this but then really big: e
How Can I decode this and find all the numbers? I now have the following code:
import glob
import re
path = "C:\\Users\\Joey\\Downloads\\db_sdcard\\mysql\\ibdata1"
files= glob.glob(path)
for name in files:
with open(name, 'r') as f:
for line in f:
print line
match = re.search(r'(/b/d{2}-/d{8}/b)', line)
if match:
found = match.group()
print found
When I run my script i get the following output:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
Where do I have to put the .decode('utf8')
And is my code for the rest good?
Upvotes: 0
Views: 68
Reputation: 4130
Try using the following to find your numbers:
re.findall("\d{2}-\d{8}", line)
It creates a list of all of the matching substrings that fit the format xx-xxxxxxxx
, where x
is a digit.
When using the last line from your question as an example:
>>> line = ' P t\xe2\x82\xac \xc5\x92 \xc3\x98p\xe2\x82\xac Q~\xc3\x80t\xc3\xb406-23423230xx06-34893646xx secure_encryptedsecure_encrypted\xe2\x82\xac -\xe2\x82\xac -\xe2\x82\xac \n'
>>> re.findall("\d{2}-\d{8}", line)
['06-23423230', '06-34893646']
Here it is in the full statement:
for name in files:
with open(name, 'r') as f:
for line in f:
matches = re.findall("\d{2}-\d{8}", line)
for mt in matches:
print mt
This will print
each match on separate lines.
You could even findall
the matches in the whole file at once:
for name in files:
with open(name, 'r') as f:
matches = re.findall("\d{2}-\d{8}", f.read())
for mt in matches:
print mt
Upvotes: 2