A.A.
A.A.

Reputation: 165

QT C++ what is a good way to read a huge data from files and store it in memory?

I have audio data and I am not sure what is the best way to store it as matrix. I have 4 large files of recordings from acoustic sensors, each file has 4 channels data interleaved. I am using Qt C++ to do some treatements of these data. I already made this approach using QVector of QVectors to store the data in.

QVector<QVector<int>> buffer(16) // 4 * 4 : numberOfChannels * numberOfFiles

for(int i = 0 ; i < 4 ; i++){ 
    QFile file(fileList[i]);    // fileList is QList of QStrings contains 4 files path
    if(file.open(QIODevice::ReadOnly)){
        int k = 0;
        while(!file.atEnd()){
            QByteArray sample = file.read(depth/8); // depth here is 24
            int integerSample = convertByteArrayToIntFunction(sample);
            buffer[4 * i + (K%4)].append(integerSample);    
            k++;

        }
    }
}

To have at the end this matrix of 16 columns like below(f:file, c:channel):

f1c0 | f1c1 | f1c2 | f1c3 | f2c0 | f2c1 | ... | f4c2 | f4c3

But this approach it takes ages for large files of few gigabytes. I am wondering if there is another efficient way to fulfill this task and gain a lot of time. As I found, I can divide reading from files to chunks but still not clear for me. Thanks in advance.

Upvotes: -1

Views: 462

Answers (1)

Botje
Botje

Reputation: 31020

There are two obvious antipatterns in your code.

The first one is not pre-sizing your QVectors. This means that every so often a call to append will notice that the vector's storage is full, which triggers an allocation of memory for a larger vector and then copying the contents of the vector before the append can complete. You know in advance how many samples are in each file, so you can use QVector::reserve to allocate the right amount in advance and inhibit this behavior:

const int bps = depth / 8;
QFile file (fileList[i]);
auto numSamples = file.size() / bps / 4; // "depth" bits per sample and 4 channels
for (int j = 0; j < 4; j++) {
  buffer[4 * i + j].reserve(numSamples);
}

Secondly, you are calling file.read() for every sample. This means you are repeatedly paying the cost of retrieving data (although buffering will alleviate this a bit) and that of allocating a QByteArray. Instead, read a huge chunk of the file at once and then loop over that:

while (!file.atEnd()) {
  QByteArray samples = file.read(1'000'000 * 4 * bps); // read up to a million samples at once
  for (int k = 0; k * bps < samples.size(); k++) {
    QByteArray sample = samples.mid(k * bps, bps);
    buffer[4 * i + (k % 4)].append(convertByteArrayToIntFunction(sample));
  }
}

You can play around with the 1'000'000 number to see if there is a more optimal number, and you can probably gain a few percent more performance by passing convertByteArrayToIntFunction a const char *, but more readable is probably better.

Upvotes: 1

Related Questions