Sandeep Shanbhag
Sandeep Shanbhag

Reputation: 83

Spark Bean Encoder is mapping wrong values for Nested Classes

All i am trying to do is convert a DataSet of type Shapes to another another Dataset of the same type. But i see that wrong values are getting mapped to the bean.

My input file shapes.json is here

{"shapes":[{"length":0,"area":73488.0,"isRound":true}]}

But at the time of mapping after encoding i see these values in map function which as you see different from my input file.

Shapes [shapes=[Shape [length=1, area=73488.0, isRound=false]]]

I tried a lot to debug the issue as well as a lots of googling but not able to understand the reason for this wrong mapping.

Here is my very simple main function

public static void main(String[] args) {
    //Step 1 Read from input
    Dataset<Row> df = session.read().format("json").option("header",  true).load("shapes.json");

    //Step 2 Use bean encoder 
    Dataset<Shapes> shapeDf = df.as(Encoders.bean(Shapes.class));
    shapeDf.show(); // This shows correct values

    //Step 3 Use map fucntion
    Dataset<Shapes> anotherShapeDf = shapeDf.map((MapFunction<Shapes, Shapes>) row -> {
        System.out.println(row); // Wrongly mapped values being printed
        return row;
    }, Encoders.bean(Shapes.class));

    // Wrong values are mapped
    anotherShapeDf.show(); 
}

And here are my 2 beans classes

  1. Shapes.class

    public class Shapes implements Serializable{
    
        private static final long serialVersionUID = -8018523772473481858L;
    
        private Shape[] shapes;
    
        public Shape[] getShapes() {return shapes;}
        public void setShapes(Shape[] shapes) {this.shapes = shapes;}
    
        @Override
        public String toString() {
            return "Shapes [shapes=" + Arrays.toString(shapes) + "]";
        }
    }
    
  2. Shape.class

    public class Shape implements Serializable {
    
        private static final long serialVersionUID = 7293213441670072327L;
    
        private long length;
        private double area;
        private boolean round;
    
        public Long getLength() {return length;}
        public void setLength(Long length) {this.length = length;}
    
        public Double getArea() {return area;}
        public void setArea(Double area) {this.area = area;}
    
        public boolean isRound() {return round;}
        public void setRound(boolean round) {this.round = round;}
    
        @Override
        public String toString() {
            return "Shape [length=" + length + ", area=" + area + ", round=" + round + "]";
        }
        }
    

Upvotes: 2

Views: 1136

Answers (1)

Gelerion
Gelerion

Reputation: 1724

I won't get much into the details of how it might be figured out, but the reason you are getting the wrong results is schema mismatch.

Change isRound getter/setter methods to the following signature:

public boolean getIsRound() {
    return isRound;
}

public void setIsRound(boolean isRound) {
    this.isRound = isRound;
}

and all works like a charm

Shapes [shapes=[Shape [length=0, area=73488.0, isRound=true]]]
+--------------------+
|              shapes|
+--------------------+
|[[73488.0, true, 0]]|
+--------------------+

Upvotes: 2

Related Questions