Reputation: 1611
I am using the ChoETL and ChoETL.Parquet library to create a parquet file based on some other data. I can create the file just fine locally.
using (ChoParquetWriter parser = new ChoParquetWriter($"..\\..\\..\\parquet_files\\{club}_events.parquet"))
{
parser.Write(events);
}
In this code snippet, events is a list of objects containing strings. They will be converted to parquet data.
So far I have written the code to upload to Azure, but it needs a local file as input.
BlobServiceClient BlobServiceClient = new BlobServiceClient("REDACTED");
var containerClient = BlobServiceClient.GetBlobContainerClient("base-test");
BlobClient blobClient = containerClient.GetBlobClient($"Base/{RequestTime.Year}/{RequestTime.Month}/{RequestTime.Day}/{RequestTime.Hour}/{RequestTime.Minute}/events.parquet");
using FileStream uploadFileStream = File.OpenRead("..\\..\\..\\events.parquet");
await blobClient.UploadAsync(uploadFileStream, true);
uploadFileStream.Close();
I need it to be created in memory then uploaded to Azure blob storage. How can I do this? For clarification: I would need the parquet file to be uploaded.
Upvotes: 1
Views: 1673
Reputation: 23141
Regarding the issue, you can use the method BlockBlobClient.OpenWriteAsync
to get a stream and provide the stream for ChoParquetWriter
. Then the writer will directly write things to Azure blob.
For example
List<EmployeeRecSimple> objs = new List<EmployeeRecSimple>();
EmployeeRecSimple rec1 = new EmployeeRecSimple();
rec1.Id = 1;
rec1.Name = "Mark";
objs.Add(rec1);
EmployeeRecSimple rec2 = new EmployeeRecSimple();
rec2.Id = 2;
rec2.Name = "Jason";
objs.Add(rec2);
BlobServiceClient blobServiceClient = new BlobServiceClient(connectionString);
var desContainer = blobServiceClient.GetBlobContainerClient("output");
var desBlob= desContainer.GetBlockBlobClient("my.parquet");
var options = new BlockBlobOpenWriteOptions {
HttpHeaders = new BlobHttpHeaders {
ContentType = MimeMapping.GetMimeMapping("parquet"),
},
// progress updates about data transfers
ProgressHandler = new Progress<long> (
progress => Console.WriteLine("Progress: {0} bytes written", progress))
};
using (var outStream = await desBlob.OpenWriteAsync(true, options).ConfigureAwait(false))
using (ChoParquetWriter parser = new ChoParquetWriter(outStream)) {
parser.Write(objs);
}
public partial class EmployeeRecSimple
{
public int Id { get; set; }
public string Name { get; set; }
}
Upvotes: 3