mattwilsn
mattwilsn

Reputation: 188

Apache Beam , how to GroupBy in a List of Objects

I have a list of cars objects in a PCollection. PCollection<List<Car>>

Each car has a color.

I want to sort this list where the color is the key and cars that have that color are the values and end up with a KV<String, List<Car>>

{"red":[car1,car2],"green":[car3,car4]}

Car car1 = new Car();
Car car2 = new Car();
Car car3 = new Car();
Car car4 = new Car();
    
car1.setColor("red");
car2.setColor("red");
car3.setColor("green");
car4.setColor("green");

final List<Cars> cars = Arrays.asList(car1,car2,car3,car4);
PCollection<Car> carsCollection = pipeline.apply(Create.of(cars));


PCollection<KV<String, List<Car>>> sortedCars = carsCollection.apply(...)
 

Maybe something like this wold work

PCollection<KV<String, List<Car>>> sortedCars =
   cars.apply(WithKeys.of(new SimpleFunction<String, List<Car>>() {
       @Override
        public String apply(Car car) {
            return cat.getColor();
        }
}));

Upvotes: 0

Views: 1916

Answers (1)

Reza Rokni
Reza Rokni

Reputation: 1256

You can make use of the Core GroupByKey transform.

For your WithKeys you can also make use of the lambda

(WithKey.of(x -> x.getColor())).apply(GroupByKey.create())

This will produce a KV<key,Iterable>

Upvotes: 2

Related Questions