Reputation: 11
i can read data from csv with spark, but i don't know how to groupBy with specific array. I want to groupBy
'Name'. This is my code :
public class readspark {
public static void main(String[] args) {
final ObjectMapper om = new ObjectMapper();
System.setProperty("hadoop.home.dir", "D:\\Task\\winutils-master\\hadoop-3.0.0");
SparkConf conf = new SparkConf()
.setMaster("local[3]")
.setAppName("Read Spark CSV")
.set("spark.driver.host", "localhost");
JavaSparkContext jsc = new JavaSparkContext(conf);
JavaRDD<String> lines = jsc.textFile("D:\\Task\\data.csv");
JavaRDD<DataModel> rdd = lines.map(new Function<String, DataModel>() {
@Override
public DataModel call(String s) throws Exception {
String[] dataArray = s.split(",");
DataModel dataModel = new DataModel();
dataModel.Name(dataArray[0]);
dataModel.ID(dataArray[1]);
dataModel.Addres(dataArray[2]);
dataModel.Salary(dataArray[3]);
return dataModel;
}
});
rdd.foreach(new VoidFunction<DataModel>() {
@Override
public void call(DataModel stringObjectMap) throws Exception {
System.out.println(om.writeValueAsString(stringObjectMap));
}
}
);
}
Upvotes: 0
Views: 198
Reputation: 862
Spark provides the group by functionality directly:
JavaPairRDD<String, Iterable<DataModel>> groupedRdd = rdd.groupBy(dataModel -> dataModel.getName());
This returns a pair rdd where the key is the Name (determined by the lambda provided to group by) and the value is data models with that name.
If you want to change the group by logic, all you need to do is provide corresponding lambda.
Upvotes: 1