Reputation: 945
I am trying to do below operation on a dataset for Grouping and aggregating the Column expend to add up. But this isn't working on a normal Dataset it says for RelationalGroupedDataset. How can I achieve the below operation in the Normal Dataset
dataset.select.(col("col1"),col("col2"),col("expend")).groupBy(col("col1"),col("col2"),col("expend")).agg(sum("expend"))
The SQL query looks like
select col1,col2,SUM(expend) from table group by col1,col2
The Columns gets repeated when I try this code.
dataset.columns()
gives me [col1,col2,expend,expend]
is the way of approach right?
Upvotes: 3
Views: 16275
Reputation: 945
I used below code to solve the issue.
Created a list List<Column> aggCols;
This will have the operation of the columns.
here I added as
aggCols.add(expr("sum(expend1)"));
addCols.add(expr("sum(expend2)"));
dataset.select.(col("col1"),col("col2"),col("expend"))
.groupBy(col("col1"),col("col2"))
.agg(aggCols.get(0), JavaConverters.asScalaIteratorConverter(aggCols.subList(1,aggCols.size()).iterator()).asScala().toSeq());
I added some checks when I have only one col to sum then I directly do the sum.
Upvotes: 6