createDataFrame() returning a list instead of DataFrame in Spark

Question

I am running Spark 1.5.1. On startup I have HiveContext available as sqlContext but set

sqlContext2 = SQLContext(sc)

I create a pipelined RDD by parsing a list of strings to JSON

data = points.map(lambda line: json.loads(line))

I then try to convert this into a dataframe using

DF = sqlContext2.createDataFrame(data).collect()

This runs perfectly, but then when i run type(DF) it says that it is a list.

How is this possible? How is a list coming out of a createDataFrame()

eliasah · Accepted Answer

That's because when you apply collect() on a DataFrame, it return a list that contains all of the elements (Rows) in this DataFrame.

if you want just a DatFrame, df = sqlContext.createDataFrame(data) is enough.

There is no need for sqlContext2 here.

Answers (1)