Mikesama
Mikesama

Reputation: 400

How to create a PySpark dataframe from the output of a dataframe filter?

I have to create 2 dataframes from a single dataframe based on a filter function.

#df is an existing dataframe

Condition for the first dataframe

df.filter(df['Date'] == max_date ).display()

Condition for the second dataframe

df.filter(df['Date'] != max_date ).display()

FYI, type of dataframe 'df' is:

# <class 'pyspark.sql.dataframe.DataFrame'>

Upvotes: 0

Views: 1859

Answers (1)

Olca Orakcı
Olca Orakcı

Reputation: 382

You can just assign the output to a new df.

new_df = df.filter(df['Date'] != max_date )
new_df2 = df.filter(df['Date'] == max_date )

Upvotes: 1

Related Questions