Reputation: 3199
I'm using spark-sql-2.4.1v with java8.
I have dynamic list of columns is are passed into my function.
i.e.
List<String> cols = Arrays.asList("col_1","col_2","col_3","col_4");
Dataset<Row> df = //which has above columns plus "id" ,"name" plus many other columns;
Need to select cols + "id" + "name"
I am doing as below
Dataset<Row> res_df = df.select("id", "name", cols.stream().toArray( String[]::new));
this is giving compilation error. so how to handle this use-case.
Tried :
When I do something like below :
List<String> cols = new ArrayList<>(Arrays.asList("col_1","col_2","col_3","col_4"));
cols.add("id");
cols.add("name");
Giving error
Exception in thread "main" java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
Upvotes: 0
Views: 828
Reputation: 1300
You have a bunch of ways to achieve this, relying on different select
method signatures.
One of the possible solutions, with the assumption cols
List is immutable and is not controlled by your code:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import scala.collection.JavaConverters;
public class ATest {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.master("local[2]")
.getOrCreate();
List<String> cols = Arrays.asList("col_1", "col_2");
Dataset<Row> df = spark.sql("select 42 as ID, 'John' as NAME, 1 as col_1, 2 as col_2, 3 as col_3, 4 as col4");
df.show();
ArrayList<String> newCols = new ArrayList<>();
newCols.add("NAME");
newCols.addAll(cols);
df.select("ID", JavaConverters.asScalaIteratorConverter(newCols.iterator()).asScala().toSeq())
.show();
}
}
Upvotes: 1
Reputation: 2451
You could create array of Columns and pass it to the select statement.
import org.apache.spark.sql.*;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;
List<String> cols = new ArrayList<>(Arrays.asList("col_1","col_2","col_3","col_4"));
cols.add("id");
cols.add("name");
Column[] cols2 = cols.stream()
.map(s->new Column(s)).collect(Collectors.toList())
.toArray(new Column[0]);
settingsDataset.select(cols2).show();
Upvotes: 1