Reputation: 695
Having trouble converting the following list to a pyspark dataframe.
lst = [[1, 'A', 'aa'], [2, 'B', 'bb'], [3, 'C', 'cc']]
cols = ['col1', 'col2', 'col3']
Desired output:
+----------+----------+----------+
| col1 | col2 | col3 |
+----------+----------+----------+
| 1 | A | aa |
+----------+----------+----------+
| 2 | B | bb |
+----------+----------+----------+
| 3 | C | cc |
+----------+----------+----------+
I'm essentially looking for the pandas equivalent of:
df = pd.DataFrame(data=lst,columns=cols)
Upvotes: 0
Views: 5746
Reputation: 4480
If you have pandas package installed then can just import the dataframe to pyspark using spark.createDataFrame
import pandas as pd
from pyspark.sql import SparkSession
lst = [[1, 'A', 'aa'], [2, 'B', 'bb'], [3, 'C', 'cc']]
cols = ['col1', 'col2', 'col3']
df = pd.DataFrame(data=lst,columns=cols)
#Create PySpark SparkSession
spark = SparkSession.builder \
.master("local[1]") \
.appName("spark") \
.getOrCreate()
#Create PySpark DataFrame from Pandas
sparkDF=spark.createDataFrame(df)
sparkDF.printSchema()
sparkDF.show()
Alternatively, you can also do it without having pandas
from pyspark.sql import SparkSession
lst = [[1, 'A', 'aa'], [2, 'B', 'bb'], [3, 'C', 'cc']]
cols = ['col1', 'col2', 'col3']
#Create PySpark SparkSession
spark = SparkSession.builder \
.master("local[1]") \
.appName("spark") \
.getOrCreate()
df = spark.createDataFrame(lst).toDF(*cols)
df.printSchema()
df.show()
Upvotes: 3