Reputation: 656
I want to optimize the reading of a column in Parquet using the ReadBatch method.
You pass in the number of rows to read:
int cnt = reader->ReadBatch(10, nullptr, nullptr, &value, &values_read);
In this case, I am asking for 10 and the actual number read is the return value.
Is there a way to get the number of rows in the Row Group before the read?
Upvotes: 0
Views: 364
Reputation: 656
Use the meta-data method at the FileReader or RowGroup to get # of rows:
// Total Rows for Parquet File
std::unique_ptr<parquet::ParquetFileReader> parquet_reader = ...;
std::shared_ptr<parquet::FileMetaData> file_metadata = parquet_reader->metadata();
int total_num_rows = file_metadata->num_rows();
// Rows for specific Row Group
std::shared_ptr<parquet::RowGroupReader> row_group_reader = ...;
auto rgMetaData = row_group_reader->metadata();
int rowGroupNumRows = rgMetaData->num_rows();
Upvotes: 1