Reputation: 16723
I have a table
DEST_COUNTRY_NAME ORIGIN_COUNTRY_NAME count
United States Romania 15
United States Croatia 1
United States Ireland 344
I converted the above into a DataFrame
val flightData2015 = spark
.read
.option("inferSchema", "true")//infers the input schema automatically from data
.option("header", "true")//uses the first line as names of columns.
.csv("/data/flight-data/csv/2015-summary.csv");
I can get only one column from the DataFrame
using the col
function
scala> data.col("count");
res70: org.apache.spark.sql.Column = count
But I notice that no actions are listed for Column. Are there any actions I can do on a Column
, eg max
, show
etc.
I tried to run max
function on the count
column but I still don't see any result.
scala> max(dataDS.col("count"));
res78: org.apache.spark.sql.Column = max(count)
How do I perform an action on a Column
?
Upvotes: 0
Views: 44
Reputation: 191743
You could just look at the ScalaDoc
Also in the SparkSQL docs, those $"name"
things are Column
objects.
So, you could do flightData2015.select($"count" > 1).show()
, and you would get only two rows.
If you want to find the max of one, then you need to select it from the DataFrame in a different way
Something like this
// TODO: import sql functions
flightData2015.select(max($"count"))
Upvotes: 1
Reputation: 26
No action whatsoever. Column is not a distributed data structure and is not bound to a particular data.
Instead columns are expression which are to be evaluated in specific context of a Dataset
, like select
, filter
or agg
.
Upvotes: 1