Reputation: 1923
I have the following row in pyspark. I want to basically merge it with a pandas dataframe.
Row(Banked_Date_Calc__c=0 NaN
Name: Banked_Date_Calc__c, dtype: float64, CloseDate=0 2018-06-13T00:00:00.000Z
Name: CloseDate, dtype: object, CourseGEV__c=0 2990
Name: CourseGEV__c, dtype: int64, Id=0 0060h0000169NWLAA2
Name: Id, dtype: object, OwnerId=0 0050L000008Z30mQAC
Name: OwnerId, dtype: object, timestamp=0 2018-06-13 17:02:30.017566
Name: timestamp, dtype: datetime64[ns])
Right now I am getting error that DataFrame is not properly called when i am putting the above row in pd.DataFrame(msg)
msg = Row(.....) #Row is from above
pd.DataFrame(msg)
Upvotes: 0
Views: 4173
Reputation: 2477
You can't pass a pyspark row directly to the Pandas Dataframe constructor. You can do it with an intermediary dict.
row_d = Row(...).asDict()
pd_df = pd.DataFrame.from_dict(row_d)
Upvotes: 2