Reputation: 663
I would like to run sql query on dataframe but do I have to create a view on this dataframe? Is there any easier way?
df = spark.createDataFrame([
('a', 1, 1), ('a',1, None), ('b', 1, 1),
('c',1, None), ('d', None, 1),('e', 1, 1)
]).toDF('id', 'foo', 'bar')
and the query I want to run some complex queries against this dataframe. For example I can do
df.createOrReplaceTempView("temp_view")
df_new = pyspark.sql("select id,max(foo) from temp_view group by id")
but do I have to convert it to view first before querying it? I know there is a dataframe equivalent operation. The above query is only an example.
Upvotes: 3
Views: 4584
Reputation: 195
You can just do
df.select('id', 'foo')
This will return a new Spark DataFrame with columns id
and foo
.
Upvotes: 2