Carmelo Smith
Carmelo Smith

Reputation: 23

How to convert pandas dataframe to pyspark dataframe which has attribute to rdd?

Now I am doing a project for my course, and find a problem to convert pandas dataframe to pyspark dataframe . I have produce a pandas dataframe named data_org as follows. enter image description here

And I want to covert it into pyspark dataframe to adjust it into libsvm format. So my code is

from pyspark.sql import SQLContext  
spark_df = SQLContext.createDataFrame(data_org)

However, it went wrong.

TypeError: createDataFrame() missing 1 required positional argument: 'data'

I really do not know how to do. And my python version is 3.5.2 and pyspark version is 2.0.1. I am looking forward to your reply.

Upvotes: 1

Views: 12521

Answers (1)

Sociopath
Sociopath

Reputation: 13401

First pass sparkContext to SQLContext:

from pyspark import SparkContext
from pyspark.sql import SQLContext 
sc = SparkContext("local", "App Name")
sql = SQLContext(sc)

then use createDataFrame like below:

spark_df = sql.createDataFrame(data_org)

Upvotes: 4

Related Questions