Reputation: 23
Now I am doing a project for my course, and find a problem to convert pandas dataframe
to pyspark dataframe
.
I have produce a pandas dataframe named data_org as follows.
enter image description here
And I want to covert it into pyspark dataframe to adjust it into libsvm format. So my code is
from pyspark.sql import SQLContext
spark_df = SQLContext.createDataFrame(data_org)
However, it went wrong.
TypeError: createDataFrame() missing 1 required positional argument: 'data'
I really do not know how to do. And my python version is 3.5.2 and pyspark version is 2.0.1. I am looking forward to your reply.
Upvotes: 1
Views: 12521
Reputation: 13401
First pass sparkContext to SQLContext:
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext("local", "App Name")
sql = SQLContext(sc)
then use createDataFrame
like below:
spark_df = sql.createDataFrame(data_org)
Upvotes: 4