Reputation: 103
I start with a pyspark dataframe and gets converted to a list after I use .take() on it. How can I keep it a pyspark dataframe?
df1 = Ce_clean
print(type(df1))
df1 = df1.take(1000)
print(type(df1))
<class 'pyspark.sql.dataframe.DataFrame'>
<class 'list'>
Upvotes: 0
Views: 128
Reputation: 6748
You can either convert the RDD/list
to df or use limit(n)
df2 = spark.createDataFrame(df1.take(100))
type(df2)
<class 'pyspark.sql.dataframe.DataFrame'>
or
df3 = df1.limit(100)
type(df3)
<class 'pyspark.sql.dataframe.DataFrame'>
Upvotes: 1