user24470430
user24470430

Reputation: 21

How to write DataTable with blob/clob/nclob columns to parquet file using ChoParquetWriter

I want to have a copy of on premise oracle database tables saved in parquet files. There are many different tables and owners in this database, so I decided to use DataTable to make this process as simple as possible. I am using ChoParquetWriter from ChoETL. Unfortunately it throws an exception

ChoETL.ChoMissingRecordFieldException: „No matching property found in the object for '<column_name>' Parquet column.”

while writing a parquet file from DataTable object. In DataTable there is an column with a BLOB type.

I am downloading DataTable using function:

public DataTable GetDataTableFromOracleTable(string connectionString, string owner, string tableName)
{
    DataTable returnedTable = new DataTable();
    OracleConnection connection = new OracleConnection(connectionString);
    connection.Open();
    using (OracleCommand dataCmd = new OracleCommand($"SELECT * FROM {owner}.{tableName}", connection))
    {
        using (OracleDataAdapter adapter = new OracleDataAdapter(dataCmd))
        {
            DataTable table = new DataTable($"{owner}_{tableName}");
            adapter.Fill(returnedTable);
        }
    }
    connection.Close();
    return returnedTable;
}

And then I am trying to write this DataTable into parquet file.

using (var parser = new ChoParquetWriter(parquetSourcePath).
    Configure(c => c.CompressionMethod = Parquet.CompressionMethod.Gzip)
    .Setup(s => s.BeforeRecordFieldWrite += (o, e) =>
    {
        if (e.Source == DBNull.Value)
            e.Source = null;
    })
    )
{
    parser.Write(testTable);
}

I tried google this and ask chatGPT, but I've run out of ideas.

Please, help me with writing parquet file, so I would not lose information from BLOB type column.

Upvotes: 1

Views: 198

Answers (1)

Cinchoo
Cinchoo

Reputation: 6322

Well, there is subtle issue with value type array handling in the library. Fix will be available on next release. In the mean time, you can overcome this issue by using UseNestedKeyFormat(false)

using (var parser = new ChoParquetWriter(parquetSourcePath).
    Configure(c => c.CompressionMethod = Parquet.CompressionMethod.Gzip)
    .Setup(s => s.BeforeRecordFieldWrite += (o, e) =>
    {
        if (e.Source == DBNull.Value)
            e.Source = null;
    })
    .UseNestedKeyFormat(false)
    )
{
    parser.Write(testTable);
}

Upvotes: 0

Related Questions