Miguel 2488
Miguel 2488

Reputation: 1440

is there a way convert a spark dataframe generated from a sql statement into an rdd?

if i use this spark sql statement:

df = spark.sql('SELECT col_name FROM table_name')

it will return a spark dataframe object. How can i convert this to an rdd? is there a way to read a table directly using sql but generating an rdd instead of a dataframe?

Thanks in advance

Upvotes: 0

Views: 740

Answers (1)

Nagilla Venkatesh
Nagilla Venkatesh

Reputation: 36

df = spark.sql('SELECT col_name FROM table_name')

df.rdd # you can save it, perform transformations etc.

df.rdd returns the content as an pyspark.RDD of Row.

You can then map on that RDD of Row transforming every Row into a numpy vector. I can't be more specific about the transformation since I don't know what your vector represents with the information given.

Note 1: df is the variable define our Dataframe.

Note 2: this function is available since Spark 1.3

Upvotes: 2

Related Questions