How to do map of map processing in Spark

Question

I have a csv as shown below,

T1,Data1,1278

T1,Data1,1279

T1,Data1,1280

T1,Data2,1283 

T1,Data2,1284  

T2,Data1,1278

T2,Data1,1290

I want to create JavaPairRdd as Map of Map like below

T1,[(Data1, (1278,1279,1280)), (Data2, (1283,1284))]
T2,[(Data1, (1278,1290))]

I tried to use combinebykey to create a JavaPairRDD using the below code

JavaPairRDD>>> itemRDD = myrdd.mapToPair(new PairFunction>() {
    @Override
    public Tuple2> call(Row row) throws Exception {
        Tuple2> txInfo = new Tuple2>(row.getTimestamp(0), new Tuple2(row.getString(1), row.getInt(2)));
        return txInfo;
    }
}).combineByKey(createAcc,addItem,combine)

But I am not able to create a PairRdd like above. Whether my approach is correct? whether combinebykey can be used to create map of map in spark?

How to do map of map processing in Spark

Answers (1)

Related Questions