Can pandas read c++ binary file directly?

Question

I have a large file, which is outputed by my c++ code.

it save struct into file with binary format.

For example:

Struct A {
  char name[32]:
  int age;
  double height;
};

output code is like:

std::fstream f;
for (int i = 0; i < 10000000; ++ i)
  A a;
  f.write(&a, sizeof(a));

I want to handle it in python with pandas DataFrame.

Is there any good methos that can read it elegantly?

Pietro · Accepted Answer

Searching for read_bin I found this issue that suggests using np.fromfile to load the data into a numpy array, then converting to a dataframe:

import numpy as np
import pandas as pd

dt = np.dtype(
    [
        ("name", "S32"),   # 32-length zero-terminated bytes
        ("age", "i4"),     # 32-bit signed integer
        ("height", "f8"),  # 64-bit floating-point number
    ],
)

records = np.fromfile("filename.bin", dt)
df = pd.DataFrame(records)

Please note that I have not tested this code, so there could be some problems in the data types I picked:

the byte order might be different (big/small endian dt = np.dtype([('big', '>i4'), ('little', ')


the type for the char array is a null terminated byte array, that I think will result in a bytes type object in python, so you might want to convert that to string (using df['name'] = df['name'].str.decode('utf-8'))


More info on the data types can be found in the numpy docs.
Cheers!

Can pandas read c++ binary file directly?

Answers (2)

Related Questions