Stephen Young
Stephen Young

Reputation: 852

Spark streaming: using object as key in 'mapToPair'

In my Spark Streaming application I receive the following data types:

{
  "timestamp": 1479740400000,
  "key": "power",
  "value": 50
}

I want to group by timestamp and key and aggregate the value field.

Is there any way of keying by an object rather than a string? I want to do something like the following:

JavaPairDStream<AggregationKey, Integer> aggregation = data.mapToPair(
    (PairFunction<DataObject, AggregationKey, Integer>) data -> {
        return new Tuple2<>(new AggregationKey(data), data.value);
    }
).reduceByKey(
    (Function2<Integer, Integer, Integer>) (value1, value2) -> {
        return value1 + value2;
    }
);

But this way of trying to group doesn't work in Spark.

To get around this in the interim I'm doing new AggregationKey(data).toString(). I don't know if this is an acceptable solution or not.

Upvotes: 2

Views: 297

Answers (1)

user6022341
user6022341

Reputation:

Any object can be used with byKey methods as long as:

  • it can be serialized
  • has consistent hash
  • has meaningful equality

Upvotes: 2

Related Questions