Reputation: 7577
I have the following code:
# Get the min and max dates
minDate, maxDate = df2.select(f.min("MonthlyTransactionDate"), f.max("MonthlyTransactionDate")).first()
d = pd.date_range(start=minDate, end=maxDate, freq='MS')
tmp = pd.Series(d)
df3 = spark.createDataFrame(tmp)
I have checked tmp and a I have a pandas dataframe of a list of dates. I then check df3 but it looks like lit's just an empty list:
++
||
++
||
||
||
||
||
||
||
||
What's happening?
Upvotes: 3
Views: 1566
Reputation: 1
now we can use pyspark pandas. https://databricks.com/blog/2021/10/04/pandas-api-on-upcoming-apache-spark-3-2.html
import pyspark.pandas as pd
so we can use .to_spark() after .to_dataframe() to be pyspark's dataframe.
Upvotes: 0
Reputation: 3110
In your case d
is DatetimeIndex
. What you can do is create pandas DataFrame from DatetimeIndex
and then convert Pandas DF to spark DF. PFB Sample code.
import pandas as pd
d = pd.date_range('2018-12-01', '2019-01-02', freq='MS')
p_df = pd.DataFrame(d)
spark.createDataFrame(p_df).show()
Upvotes: 6
Reputation: 215117
d
is a DatetimeIndex
, not a pandas data frame here. You need to convert it to data frame first which can be done using to_frame
method:
d = pd.date_range('2018-10-10', '2018-12-15', freq='MS')
spark.createDataFrame(d).show()
++
||
++
||
||
++
spark.createDataFrame(d.to_frame()).show()
+-------------------+
| 0|
+-------------------+
|2018-11-01 00:00:00|
|2018-12-01 00:00:00|
+-------------------+
Upvotes: 3