Reputation: 1440
if i use this spark sql statement:
df = spark.sql('SELECT col_name FROM table_name')
it will return a spark dataframe object. How can i convert this to an rdd? is there a way to read a table directly using sql but generating an rdd instead of a dataframe?
Thanks in advance
Upvotes: 0
Views: 740
Reputation: 36
df = spark.sql('SELECT col_name FROM table_name')
df.rdd
# you can save it, perform transformations etc.
df.rdd
returns the content as an pyspark.RDD
of Row.
You can then map on that RDD
of Row transforming every Row into a numpy
vector. I can't be more specific about the transformation since I don't know what your vector represents with the information given.
Note 1: df
is the variable define our Dataframe.
Note 2: this function is available since Spark 1.3
Upvotes: 2