JohnnyFromBF
JohnnyFromBF

Reputation: 10191

How to find floating point numbers in binary file with Python?

I have a binary file mixed with ASCII in which there are some floating point numbers I want to find. The file contains some lines like this:

1,1,'11.2','11.3';1,1,'100.4';

In my favorite regex tester I found that the correct regex should be ([0-9]+\.{1}[0-9]+).

Here's the code:

import re

data = open('C:\\Users\\Me\\file.bin', 'rb')
pat = re.compile(b'([0-9]+\.{1}[0-9]+)')
print(pat.match(data.read()))

I do not get a single match, why is that? I'm on Python 3.5.1.

Upvotes: 1

Views: 1006

Answers (2)

jfs
jfs

Reputation: 414565

How to find floating point numbers in binary file with Python?

float_re = br"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?"
for m in generate_tokens(r'C:\Users\Me\file.bin', float_re):
    print(float(m.group()))

where float_re is from this answer and generate_tokens() is defined here.


pat.match() tries to match at the very start of the input string and your string does not start with a float and therefore you "do not get a single match".


re.findall("\d+\.\d+", data) produces TypeError because the pattern is Unicode (str) but data is a bytes object in your case. Pass the pattern as bytes:
re.findall(b"\d+\.\d+", data)

Upvotes: 2

Adem Öztaş
Adem Öztaş

Reputation: 21466

You can try like this,

import re
with open('C:\\Users\\Me\\file.bin', 'rb') as f:
    data = f.read()

re.findall("\d+\.\d+", data)

Output:

['11.2', '11.3', '100.4']

re.findall returns string list. If you want to convert to float you can do like this

>>> list(map(float, re.findall("\d+\.\d+", data)))
[11.2, 11.3, 100.4]

Upvotes: 2

Related Questions