Python Regex And encode

Question

I Am trying to find and print all the Phone numbers in this file. But the file got a lot of unreadable text. The file looks like this but then really big: e

How Can I decode this and find all the numbers? I now have the following code:

import glob
import re

path = "C:\Users\Joey\Downloads\db_sdcard\mysql\ibdata1"
files= glob.glob(path)
for name in files:
        with open(name, 'r') as f:
            for line in f:
                print line
                match = re.search(r'(/b/d{2}-/d{8}/b)', line)
                if match:
                    found = match.group()
                    print found

When I run my script i get the following output:

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ

Where do I have to put the .decode('utf8') And is my code for the rest good?

Zach Gates · Accepted Answer

Try using the following to find your numbers:

re.findall("\d{2}-\d{8}", line)

It creates a list of all of the matching substrings that fit the format xx-xxxxxxxx, where x is a digit.

When using the last line from your question as an example:

>>> line = '   P t\xe2\x82\xac         \xc5\x92  \xc3\x98p\xe2\x82\xac Q~\xc3\x80t\xc3\xb406-23423230xx06-34893646xx secure_encryptedsecure_encrypted\xe2\x82\xac  -\xe2\x82\xac  -\xe2\x82\xac  
'
>>> re.findall("\d{2}-\d{8}", line)
['06-23423230', '06-34893646']

Here it is in the full statement:

for name in files:
    with open(name, 'r') as f:
        for line in f:
            matches = re.findall("\d{2}-\d{8}", line)
            for mt in matches:
                print mt

This will print each match on separate lines.

You could even findall the matches in the whole file at once:

for name in files:
    with open(name, 'r') as f:
        matches = re.findall("\d{2}-\d{8}", f.read())
        for mt in matches:
            print mt

Python Regex And encode

Answers (1)

Related Questions