Reputation: 1996
I'm following http://spark.apache.org/docs/latest/sql-programming-guide.html
After typing:
val df = spark.read.json("examples/src/main/resources/people.json")
// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age| name|
// +----+-------+
// |null|Michael|
// | 30| Andy|
// | 19| Justin|
// +----+-------+
I have some questions that I didn't see the answers to.
First, what is the $-notation? As in
df.select($"name", $"age" + 1).show()
Second, can I get the data from just the 2nd row (and I don't know what the data is in the second row).
Third, how would you read in a color image with spark sql?
4th, I'm still not sure what the difference is between a dataset and dataframe in spark. The variable df is a dataframe, so could I change "Michael" to the integer 5? Could I do that in a dataset?
Upvotes: 0
Views: 2067
Reputation: 664
1) For question 1, $
sign is used as a shortcut for selecting a column and applying functions on top of it. For example:
df.select($"id".isNull).show
which can be other wise written as
df.select(col("id").isNull)
2) Spark does not have indexing, but for prototyping you can use df.take(10)(i)
where i
could be the element you want. Note: the behaviour could be different each time as the underlying data is partitioned.
Upvotes: 3
Reputation: 1712
$
is not annotation. It is a method call (shortcut for new ColumnName("name")
).Upvotes: 5