Reputation: 143
Is it possible to generate a histogram dataframe with Spark 2.1 in Java from a Dataset<Row>
table?
Upvotes: 0
Views: 2090
Reputation: 11
Example : I got a table in spark with table name as 'nation' having column as 'n_nationkey' which is Integer then this is how I did it:
String query = "select n_nationkey from nation" ;
Dataset<Row> df = spark.sql(query);
JavaRDD<Integer> jdf = df.toJavaRDD().map(row -> row.getInt(0));
JavaDoubleRDD example = jdf.mapToDouble(y -> y);
Tuple2<double[], long[]> resultsnew = example.histogram(5);
In case the column have a double type, you simply replace some things as :
JavaRDD<Double> jdf = df.toJavaRDD().map(row -> row.getDouble(0));
JavaDoubleRDD example = jdf.mapToDouble(y -> y);
Tuple2<double[], long[]> resultsnew = example.histogram(5);
Upvotes: 1