what actions can I perform on a Column

Question

I have a table

DEST_COUNTRY_NAME   ORIGIN_COUNTRY_NAME count
United States   Romania 15
United States   Croatia 1
United States   Ireland 344

I converted the above into a DataFrame

val flightData2015 = spark
.read
.option("inferSchema", "true")//infers the input schema automatically from data
.option("header", "true")//uses the first line as names of columns.
.csv("/data/flight-data/csv/2015-summary.csv");

I can get only one column from the DataFrame using the col function

scala> data.col("count");
res70: org.apache.spark.sql.Column = count

But I notice that no actions are listed for Column. Are there any actions I can do on a Column, eg max, show etc.

I tried to run max function on the count column but I still don't see any result.

scala> max(dataDS.col("count"));
res78: org.apache.spark.sql.Column = max(count)

How do I perform an action on a Column?

user11031164 · Accepted Answer

No action whatsoever. Column is not a distributed data structure and is not bound to a particular data.

Instead columns are expression which are to be evaluated in specific context of a Dataset, like select, filter or agg.

what actions can I perform on a Column

Answers (2)

Related Questions