joey
joey

Reputation: 241

Python Regex And encode

I Am trying to find and print all the Phone numbers in this file. But the file got a lot of unreadable text. The file looks like this but then really big: e

How Can I decode this and find all the numbers? I now have the following code:

import glob
import re

path = "C:\\Users\\Joey\\Downloads\\db_sdcard\\mysql\\ibdata1"
files= glob.glob(path)
for name in files:
        with open(name, 'r') as f:
            for line in f:
                print line
                match = re.search(r'(/b/d{2}-/d{8}/b)', line)
                if match:
                    found = match.group()
                    print found

When I run my script i get the following output:

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ

Where do I have to put the .decode('utf8') And is my code for the rest good?

Upvotes: 0

Views: 68

Answers (1)

Zach Gates
Zach Gates

Reputation: 4130

Try using the following to find your numbers:

re.findall("\d{2}-\d{8}", line)

It creates a list of all of the matching substrings that fit the format xx-xxxxxxxx, where x is a digit.


When using the last line from your question as an example:

>>> line = '   P t\xe2\x82\xac         \xc5\x92  \xc3\x98p\xe2\x82\xac Q~\xc3\x80t\xc3\xb406-23423230xx06-34893646xx secure_encryptedsecure_encrypted\xe2\x82\xac  -\xe2\x82\xac  -\xe2\x82\xac  \n'
>>> re.findall("\d{2}-\d{8}", line)
['06-23423230', '06-34893646']

Here it is in the full statement:

for name in files:
    with open(name, 'r') as f:
        for line in f:
            matches = re.findall("\d{2}-\d{8}", line)
            for mt in matches:
                print mt

This will print each match on separate lines.


You could even findall the matches in the whole file at once:

for name in files:
    with open(name, 'r') as f:
        matches = re.findall("\d{2}-\d{8}", f.read())
        for mt in matches:
            print mt

Upvotes: 2

Related Questions