user175025
user175025

Reputation: 434

Assign values of a column in one dataframe to a column in another dataframe

I have 2 spark dataframes:

Location    Date        Date_part   Sector      units   
USA         7/1/2021    7/1/2021    Cars        200     
IND         7/1/2021    7/1/2021    Scooters    180     
COL         7/1/2021    7/1/2021    Trucks      100     
Location    Date    Brands  units   values    
UK          null    brand1  400     120       
AUS         null    brand2  450     230       
CAN         null    brand3  150     34        

after doing unionByName, I got

Location    Date        Date_part   Sector      Brands  units   values
USA         7/1/2021    7/1/2021    Cars                200     
IND         7/1/2021    7/1/2021    Scooters            180     
COL         7/1/2021    7/1/2021    Trucks              100
UK          null        null                    brand1  400     120
AUS         null        null                    brand2  450     230
CAN         null        null                    brand3  150     34

But my expected dataframe is:

Location    Date        Date_part   Sector      Brands  units   values
USA         7/1/2021    7/1/2021    Cars                200     
IND         7/1/2021    7/1/2021    Scooters            180     
COL         7/1/2021    7/1/2021    Trucks              100
UK          null        7/1/2021                brand1  400     120
AUS         null        7/1/2021                brand2  450     230
CAN         null        7/1/2021                brand3  150     34

I need the values in date_part column to be of dataframe 1 values for all the rows. I tried with this code:

df_result=df_final.select(df_1['date_part'], df_final["*"])

This is creating an extra column date_part. How to acheive my expected dataframe?

Upvotes: 1

Views: 844

Answers (1)

Steven
Steven

Reputation: 15318

assuming Date_part is the same for the whole dataframe, there are several solutions to do that - here is one :

from pyspark.sql import functions as F

missing_date = df_result.where(F.col("Date_part").isNotNull()).first()["Date_part"]

df_result = df_result.withColumn("Date_part", F.lit(missing_date))

Upvotes: 2

Related Questions