Reputation: 13620
I currently have an EventHub instance set up in Azure. It has 5 partitions. What I want to know if if the PartitionKey
always has to be a number between 0
and n-1
with n
being the number of partitions.
I have the following code:
private static async Task SendMessagesToEventHub(int numMessagesToSend)
{
var sender = eventHubClient.CreatePartitionSender("test1");
for (var i = 0; i < numMessagesToSend; i++)
{
try
{
var message = $"Message {i}";
Console.WriteLine($"Sending message: {message}");
await sender.SendAsync(new EventData(Encoding.UTF8.GetBytes(message)));
}
catch (Exception exception)
{
Console.WriteLine($"{DateTime.Now} > Exception: {exception.Message}");
}
await Task.Delay(10);
}
Console.WriteLine($"{numMessagesToSend} messages sent.");
}
This the throws an exception
The specified partition is invalid for an EventHub partition sender or receiver. It should be between 0 and 4.
In the documentation of EventHub, this is what they say regarding the PartitionKey
:
The EventData class has a PartitionKey property that enables the sender to specify a value that is hashed to produce a partition assignment. Using a partition key ensures that all the events with the same key are sent to the same partition in the Event Hub. Common partition keys include user session IDs and unique sender IDs.
To me this means that you are not limited to an int
but can use any string
. What am I missing?
Upvotes: 4
Views: 8224
Reputation: 2509
If you don't want the default Round-robin logic and wants to evenly distribute the message to all partitions using your custom logic you can use a similar way to send a message to particular Partition Id, You should cannot assign Partition key for the eventData in this case.
You have to figure out a logic to get the PartitionId to distribute your message to all partitions
String PartitionId = GetPartitionId(message)
EventData eventData = new EventData(Encoding.UTF8.GetBytes(message));
EventHubClient.CreatePartitionedSender(PartitionId).SendAsync(eventData)
private static int GetPartitionId(Message message)
{
// Your own custom logic
var svin = message.vin.Substring(12, 5);
int partKey;
if (int.TryParse(svin, out partKey))
{
partKey = Convert.ToInt32(svin) % NumberOfPartitions;
}
return partKey;
}
Or you can set Partition key for EventData and Eventhub will distribute it to different Paritions. But Eventdata with same Parition key will go to the same Partition ID
string payLoadJson = convertToJson(record);
EventData eventData = new
EventData(Encoding.UTF8.GetBytes(payLoadJson));
eventData.PartitionKey = record.vin;
await eventHubClient.SendAsync(eventData);
Upvotes: 0
Reputation: 4993
Answer:
You cannot mix PartitionKey
and PartitionSender
- they are 2 mutually exclusive concepts.
Don't use a PartitionSender
aka ehClient.CreatePartitionSender()
- API, which was designed to send to a specific partition (in which case EventHub service cannot use the PartitionKey
to-hash-to anymore).
Instead, use this code snippet in c#
:
EventData myEvent = new EventData(Encoding.UTF8.GetBytes(message));
myEvent.PartitionKey = "test1";
await eventHubClient.SendAsync(myEvent);
We learned that this was a bit confusing API to grasp for our customers and then when we did our Java
SDK, we corrected/simplified our API to look like this:
EventData myEvent = new EventData(message.getBytes(Charset.defaultCharset()))
eventHubClient.SendSync(myEvent, "test1");
The 3 types of Send Patterns Exposed by Event Hubs:
When we developed EventHubs service - we wanted to give multiple levels of control on Partitioning their event stream - to our users. We came up with the below 3 modes (our c#
client API's):
EventHubClient.Send(eventData_Without_PartitionKey) - use this when you don't want any control on how data is partitioned. EventHubs service will try to distribute data uniformly across all partitions (best-effort, no guarantees). As, you traded off having control on partitioning your data - what you gain here is high-availability. If you have an Event Hub with 32 partitions - and are using this method of sending to Event Hubs - your event will be delivered to one of the 32 Event Hubs partitions that is immediately available & have least data on it.
EventHubClient.Send(eventData_With_PartitionKey) - use this when you have a property on your data - using which you want to partition your data. EventHubs service will make sure all EventData
s with same PartitionKey
will land on the same EventHubs partition
. Here - user controls partitioning by specifying a hint - using which our service will run a hash algorithm and deliver to the hashed partition. All events with the same PartitionKey
are guaranteed to land on the same Event Hubs partition
.
EventHubSender.Send(eventData_Without_PartitionKey) - EventHubPartitionSender name would have been more apt for this - use this when you want complete control on Partitioning your data - when you need control on - which EventData
should land on which EventHubs partition
. This is typically used - when customers have their own proprietary hash algorithm - which they believe to perform better, for their scenarios - w.r.to. fairness of load distribution across all EventHubs partitions
.
What you need is (2).
here's some general reading on Event Hubs concepts...
Upvotes: 9