Manu Chadha
Manu Chadha

Reputation: 16723

what actions can I perform on a Column

I have a table

DEST_COUNTRY_NAME   ORIGIN_COUNTRY_NAME count
United States   Romania 15
United States   Croatia 1
United States   Ireland 344

I converted the above into a DataFrame

val flightData2015 = spark
.read
.option("inferSchema", "true")//infers the input schema automatically from data
.option("header", "true")//uses the first line as names of columns.
.csv("/data/flight-data/csv/2015-summary.csv");

I can get only one column from the DataFrame using the col function

scala> data.col("count");
res70: org.apache.spark.sql.Column = count

But I notice that no actions are listed for Column. Are there any actions I can do on a Column, eg max, show etc.

I tried to run max function on the count column but I still don't see any result.

scala> max(dataDS.col("count"));
res78: org.apache.spark.sql.Column = max(count)

How do I perform an action on a Column?

Upvotes: 0

Views: 44

Answers (2)

OneCricketeer
OneCricketeer

Reputation: 191743

You could just look at the ScalaDoc

Also in the SparkSQL docs, those $"name" things are Column objects.

So, you could do flightData2015.select($"count" > 1).show(), and you would get only two rows.

If you want to find the max of one, then you need to select it from the DataFrame in a different way

Something like this

// TODO: import sql functions

flightData2015.select(max($"count"))

Upvotes: 1

user11031164
user11031164

Reputation: 26

No action whatsoever. Column is not a distributed data structure and is not bound to a particular data.

Instead columns are expression which are to be evaluated in specific context of a Dataset, like select, filter or agg.

Upvotes: 1

Related Questions