chicagobeast12
chicagobeast12

Reputation: 695

Convert list of lists to pyspark dataframe?

Having trouble converting the following list to a pyspark dataframe.

lst = [[1, 'A', 'aa'], [2, 'B', 'bb'], [3, 'C', 'cc']]

cols = ['col1', 'col2', 'col3']

Desired output:

    +----------+----------+----------+ 
    | col1     | col2     | col3     |
    +----------+----------+----------+ 
    | 1        | A        | aa       |
    +----------+----------+----------+ 
    | 2        | B        | bb       |
    +----------+----------+----------+ 
    | 3        | C        | cc       |
    +----------+----------+----------+ 

I'm essentially looking for the pandas equivalent of:

df = pd.DataFrame(data=lst,columns=cols)

Upvotes: 0

Views: 5746

Answers (1)

vht981230
vht981230

Reputation: 4480

If you have pandas package installed then can just import the dataframe to pyspark using spark.createDataFrame

import pandas as pd
from pyspark.sql import SparkSession


lst = [[1, 'A', 'aa'], [2, 'B', 'bb'], [3, 'C', 'cc']]
cols = ['col1', 'col2', 'col3']

df = pd.DataFrame(data=lst,columns=cols)

#Create PySpark SparkSession
spark = SparkSession.builder \
    .master("local[1]") \
    .appName("spark") \
    .getOrCreate()

#Create PySpark DataFrame from Pandas
sparkDF=spark.createDataFrame(df) 
sparkDF.printSchema()
sparkDF.show()

Alternatively, you can also do it without having pandas

from pyspark.sql import SparkSession

lst = [[1, 'A', 'aa'], [2, 'B', 'bb'], [3, 'C', 'cc']]
cols = ['col1', 'col2', 'col3']

#Create PySpark SparkSession
spark = SparkSession.builder \
    .master("local[1]") \
    .appName("spark") \
    .getOrCreate()

df = spark.createDataFrame(lst).toDF(*cols)
df.printSchema()
df.show()

Upvotes: 3

Related Questions