rsajdak
rsajdak

Reputation: 103

How do I stop a pyspark dataframe from changing to a list?

I start with a pyspark dataframe and gets converted to a list after I use .take() on it. How can I keep it a pyspark dataframe?

    df1 = Ce_clean
    print(type(df1))
    df1 = df1.take(1000)
    print(type(df1))

<class 'pyspark.sql.dataframe.DataFrame'>

<class 'list'>

Upvotes: 0

Views: 128

Answers (1)

Equinox
Equinox

Reputation: 6748

You can either convert the RDD/list to df or use limit(n)

 df2 = spark.createDataFrame(df1.take(100))
 type(df2)
 <class 'pyspark.sql.dataframe.DataFrame'>

or

 df3 = df1.limit(100)
 type(df3)
 <class 'pyspark.sql.dataframe.DataFrame'>

Upvotes: 1

Related Questions